Feature Selection for Identifying Protein Disordered Regions

doi:10.4015/S1016237210001839

淡江大學機構典藏 > 工學院 > 資訊工程學系暨研究所 > 期刊論文 > Item 987654321/59913

請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/59913

題名:	Feature Selection for Identifying Protein Disordered Regions
作者:	Hsu, Hui-Huang;Hsieh, Cheng-Wei
貢獻者:	淡江大學資訊工程學系
關鍵詞:	Disordered protein region;k-Medoids clustering;Feature selection;Proteomics
日期:	2010-04
上傳時間:	2011-10-05 22:26:29 (UTC+8)
出版者:	Singapore: World Scientific Publishing Co. Pte. Ltd.
摘要:	Determining the structure of a protein is not an easy task, which usually involved a time-consuming and costly process in the web lab. Using computational methods to predict a protein's tertiary structure from its primary structure (the amino acid sequence) is desirable. Disordered regions are segments of a protein that do not have a fixed conformation, which makes the structure prediction harder. Also, these disordered regions are functionally important for a protein. In this research, we would like to identify such regions with a focus on selecting a proper feature set. Three feature selection methods, namely F-score, information gain (IG), and k-medoids clustering, are used for feature selection. The support vector machine (SVM) is then used for classification. The results show that the classification accuracy can be raised with a smaller feature set. The k-medoids clustering feature selection can reduce the number of features from 440 to 150 and improve the accuracy from 84.66 to 86.81% in five-fold cross validation. It also has a more stable performance than F-score and IG.
關聯:	Biomedical Engineering: Applications, Basis and Communications 22(2), pp.119-125
DOI:	10.4015/S1016237210001839
顯示於類別:	[資訊工程學系暨研究所] 期刊論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
1016-2372_22(2)p119-125.pdf		299Kb	Adobe PDF	343	檢視/開啟

在機構典藏中所有的資料項目都受到原著作權保護.

TAIR相關文章

資料載入中.....