English  |  正體中文  |  简体中文  |  Items with full text/Total items : 62797/95867 (66%)
Visitors : 3743854      Online Users : 578
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/35092


    Title: 應用支持向量機於癌症微陣列資料識別
    Other Titles: Cancer classification on microarray expression data with support vector machine
    Authors: 呂明達;Lu, Ming-da
    Contributors: 淡江大學資訊工程學系碩士班
    許輝煌;Hsu, Hui-huang
    Keywords: 癌症分類;微陣列;支持向量機;特徵選擇;皮爾森相關係數;Cancer Classification;Microarray;Support Vector Machine;Feature Selection;Pearson Correlation Coefficient
    Date: 2008
    Issue Date: 2010-01-11 06:00:32 (UTC+8)
    Abstract: 微陣列是一個現今十分重要的基因分析工具,他可以協助分別多種的癌症類別。我們進行了一個癌症微陣列資料的識別工作,在這個工作中,我們運用了資訊科學的特徵選擇方法和支持向量機的機器學習方法,來進行將資料簡化和資料預測的工作。
    我們將這兩樣的工具運用在三種的癌陣微陣列資料上,分別是白血病、肺癌和前列腺癌。我們運用的特徵選擇方法主要有兩類的方法,分別是距離測量法類的歐式距離特徵選擇法和相依性測量法類的皮爾森相關係數特徵選擇法。我們運用支持向量機在不同的特徵個數和三種不同的核函式,來進行分類的工作。而我們的結果顯示出距離式特徵選擇法是適合支持向量機分類器的特徵選擇法,且線性核函式在我們所進行的這三種問題來說是較佳的核函式。在這三組資料不同的特徵個數中,將至少7129個特徵數量,減少至僅15到100個特徵個數之間的狀況下,仍然能夠獲得了相等或較佳的預測結果。
    Microarray is an important tool in gene analysis research. It can help identify genes that might cause various cancers. In this thesis, we use feature selection methods and the support vector machine (SVM) to search for the disease-causing genes in microarray data of three different cancers. The feature selection methods are based on Euclidian distance (ED) and Pearson correlation coefficient (PCC). We selected three most reference microarray data sets for classification which are AML & ALL data sets, Lung cancer data sets, and Prostate data sets. We investigated the effect on prediction results by training the SVM with different numbers of features and different kinds of kernels. The results show that linear kernel is the fittest kernel in this issue. Also, equal or higher accuracy can be achieved with only 15 to 100 features which are selected from 7129 or more features of the original data sets.
    Appears in Collections:[資訊工程學系暨研究所] 學位論文

    Files in This Item:

    File SizeFormat
    0KbUnknown330View/Open

    All items in 機構典藏 are protected by copyright, with all rights reserved.


    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - Feedback