蛋白質結構直接影響蛋白質的功能,也和蛋白質的演化息息相關。在蛋白質的結構 分析中,約有14%的真核生物蛋白質具有內部結構重複的現象。這樣的內部重複結構, 影響蛋白質結構的穩定和相互作用,也比一般的蛋白質結構,在演化的關連上,更具 有生物上的意義。因此蛋白質的內部重複結構的種類及與功能相關連的特徵極具有深 入探討的價值。在我們的研究中,我們發現有一些蛋白質內部重複結構單元,雖因殘 基個數的不同,造成重複結構單元尺寸的不同,然而卻具有極為類似的構型。這些單 元如果僅採用傳統的方式進行比對將會造成誤判。本研究所建置的分析系統已經可以 使用序列比對來辨識內部重複結構,本計畫將加強完成結構比對的部分,並加入蛋白 質結構的拓樸資訊,透過拓樸分析來取得結構單元的特徵,再使用支持向量機來進行 蛋白質內部重複結構的辨識。 本計畫除了將進行完整的重複結構分析的理論發展之外,並將完成建置一蛋白質內 部重複結構的線上查詢比對系統,該系統將可以透過序列或結構比對方法,進行未知 重複序列或結構的自動辨識。識別成功的內部重複結構單元,系統將自動收錄至資料 庫,讓系統能更進一步地提高辨識的準確率。這個計畫的成果可以提供相關研究的學 者進行內部重複結構的檢索和辨識,及使用在如蛋白質互動網路、藥物開發設計等應 用。 Protein structure directly affects its function. It is also closely related to protein evolution. In protein structure analysis, 14% of proteins in eukaryotes have internal repeat substructures. These substructures determine stability and interaction of a protein structure. They are also more biologically meaningful than other parts of a protein structure in regard to protein evolution. Thus it is essential to further understand different kinds of protein internal repeat substructures and their relationships to protein functions. In our research, we have found that some protein internal repeat units (IRU) are different in size due to different numbers of residues, but they are actually very similar in conformation. There will be misclassification if we simply use the traditional method to analyze a protein with such similar IRUs. The developed system in this research can identify internal repeat structures through sequence information. However, it is necessary to add structure information to the system. Furthermore, topology information of a protein is also needed. With important structure features via topology analysis, the support vector machine will be used to identify protein internal repeat substructures. In this project, we will survey and develop the theory of repeat structures analysis. We will also finish the development of the online system for protein internal repeat substructure identification. The system will be able to automatically identify internal repeats with either sequence or structure methods. Successfully-identified IRUs will be included into the IRU database. This can further improve the recognition accuracy. The resulted system of this project can provide researchers in the field to retrieve and recognize protein internal repeat substructures. It can also be used in applications like protein interaction networks and drug design.