應用類神經網路於蛋白質二級結構預測

機構典藏 > College of Engineering > Graduate Institute & Department of Computer Science and Information Engineering > Thesis > Item 987654321/35231

Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/35231

Title:	應用類神經網路於蛋白質二級結構預測
Other Titles:	Protein secondary structure prediction by artificial neural networks
Authors:	張建蒼;Chang, Jian-tsang
Contributors:	淡江大學資訊工程學系碩士班許輝煌;Hsu, Hui-huang
Keywords:	類神經網路;蛋白質二級結構;Gamma neural network;Protein secondary structure prediction
Date:	2006
Issue Date:	2010-01-11 06:14:21 (UTC+8)
Abstract:	蛋白質的組成控制著它的功能，而人體裡面有數百甚至數千個蛋白質的存在，因此我們會想要去知道蛋白質個別的功用及它們之間彼此是如何互動的，而這個領域被稱作蛋白質體學，它的目的主要是在調查在生物裡面的蛋白質的功能為何。蛋白質的功能是決定於它的架構，目前X光結晶繞射照影(X-ray Crystallography) 與核磁共振(Nuclear Magnetic Resonance)(NMR)都能夠視覺化蛋白質的三維架構。然而它們是耗費時間且昂貴的，而所耗費的時間長達數週到數個月，另外也有解析上的問題，也就是詳細的資訊也許會在實驗裡缺失掉。相對的，隨著生物科技在近十年裡的進展，胺基酸序列能夠被大量的產生出來，而且這種技術能十分快速且便宜的決定蛋白質的胺基酸序列為何，因此我們會想要直接藉由序列來得? 蛋白質的結構。蛋白質的一級架構(primary structure)決定了二級架構(secondary structure)，二級結構決定了三級架構(tertiary structure)，四級結構(quaternary structure)也跟著被決定出來，而蛋白質的功能取決於它的三級結構以及四級結構，然而要預測出三級結構於四級結構並不是那麼容易的，目前已有不同的方法去做這方面的研究，而我們在這裡專注在利用胺基酸序列資訊去做結構預測的類神經網路技術，由於一級架構決定了二級架構，二級結構決定了三級架構，因此藉由序列得到二級架構是得到蛋白質結構的第一步。蛋白質有三種主要的二級結構：螺旋體(alpha helices)、摺板體(beta sheets)以及迴旋體(coils)，它們都是三級結構的子結構。蛋白質序列則是由20種胺基酸所組成，一般我們會以單ㄧ字母來表示一個胺基酸，而序列都會有終止碼(terminus code)去表示序列的前後端，因此我們一般會用21個的二元數字去對每個胺基酸做編碼。時間延遲類神經網路(Time-Delay Neural Network)已經被廣泛的使用在二級結構預測上，但是要決定適當的視窗(window)大小並不容易。在這篇論文裡，我們會用具備記憶深度概念的迦瑪類神經模組(Gamma Neural Model)來進一步的增進二級結構預測的效果，另一方面為了獲得更多的輸入資訊，我們把胺基酸的化學性質列入考量而產生了新的胺基酸編碼方式。在我們的實驗裡，我們發現迦瑪類神經網路能夠在耗費相當少的時間的情況下達到跟使用時間延遲類神經網路幾乎相同的預測效果，而新的編碼方式也確實提供了提升摺板預測率之效果，而這兩種技術都能夠用在當前運用到視窗概念及典型編碼方式的預測二級結構之類神經網路上。 The composition of proteins in an organism controls its functioning. There are hundreds or even thousands of proteins in an organism. To understand the function of respective protein and even the interaction between proteins is desired. The field is named as proteomics. It examines the functioning of proteins in an organism . A protein''s function can be determined by its structure. X-ray crystallographic and NMR are two techniques to visualize the three dimensional structure of the protein. However, they are both expensive and time-consuming. It takes weeks or even months to decide a protein''s structure from either of the two techniques. Another problem with the techniques is the resolution. Details might be missing in the results. On the contrary, with the advances of bio-techniques in the past decade, the primary structure, i.e., the sequence of the protein in amino acids, can be found by high-throughput methods. It is fast and cheap to determine a protein''s amino acid sequence. So it is desirable that the protein structure can be inferred simply from the protein sequence. We can say that in nature the primary structure determines the secondary structure and the secondary structure decides the tertiary structure and then the quaternary structure. One more force or interaction is placed to the protein to have the next higher level structure when the four levels of structures are considered. The function of a protein resides in its tertiary and quaternary structures. However, it is a nontrivial task to predict the tertiary structure or quaternary structure of a protein. Different methods have been tried. Here we only focus on the neural network techniques with the sequence information of the protein. It is desired that the secondary structure can be predicted from the sequence and the tertiary structure then can be determined from the secondary structure. So to decide the secondary structure from the sequence is the first step. There are three major secondary structures: alpha helices, beta sheets, and coils. They are the substructures of the tertiary structure. The protein sequence is composed of 20 amino acids. Each amino acid is named with a unique one-letter code. A terminus code is added to indicate the two ends of the sequence. Thus twenty-one-bit binary numbers can be used to encode the amino acids in the sequence. The time-delay neural network (TDNN) is generally used to classify each position of the amino acid sequence into the three substructures. However, it is hard to decide a proper window size for the TDNN. In this thesis, the gamma neural model that is adaptable in memory depths is tested for further improvement of the prediction accuracy . Also, to gain more information on the input to the neural network, chemistry properties of the amino acids are taken into consideration for encoding the sequence. In our experiments, we show that Gamma neural network (GNN) can spend much less time than time delay neural network (TDNN) and have similar results. On the other hand, the new encoding way also provides a higher beta-sheet prediction rate. These technologies can be implemented in artificial neural networks using the concept of window and the traditional encoding method for predicting protein secondary structures.
Appears in Collections:	[Graduate Institute & Department of Computer Science and Information Engineering] Thesis

Files in This Item:

File	Size	Format
	0Kb	Unknown	249	View/Open

Loading...