English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 55178/89446 (62%)
造訪人次 : 10659984      線上人數 : 19
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    請使用永久網址來引用或連結此文件: http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/95864


    題名: 使用決策樹來抽取文件自動分類系統中之分類規則 Extracting Classification Rules in Automatic Document Classification Systems by Using Decision Trees
    作者: 洪文斌
    貢獻者: 淡江大學資訊工程學系
    關鍵詞: 自動文件分類;決策樹;資訊檢索;機器學習;Automatic Document Classification;Decision Tree;Information Retrieval;Machine Learning
    日期: 1999-11
    上傳時間: 2014-02-13
    摘要: 自從Maron於1961年提出首篇文件自動分類的論文以來,傳統的分類方法不外乎機率模式與向量模式。近年來的研究也加入了統計分析、專家系統、自然語言處理、和類神經網路等先進的技術,以提高分類的正確性。以上所提的諸方法中,其對文件自動分類而言,均可視為是黑箱作業,因其分類行為或分類規則無從得知。本研究利用機械學習技術中之Quinlan的C4.5決策樹(Decision trees)來抽取文件自動分類系統中之分類規則,期使文件自動分類系統之分類行為透明化,而人們可藉由所抽取之分類規則進一步來驗證文件自動分類之正確性。在本研究中,我們採用ACM Computing Reviews的分類法作為分類的依據。我們從該期刊共收錄了56個中類別,6424篇論文為實驗用資料。再以其中的論文題目和出處當作該文件的素描(Profile)。取其中十分之一作為測試資料,其餘為訓練資料。我們從訓練資料中,使用 Quinlan的決策樹共抽取出1162條分類規則。再利用此分類規則分別對訓練文件及測試文件做分類,實驗結果分別為:訓練資料召回率為67.7%,測試資料為 45.5%。若將上述規則再精簡成 29O條分類規則,則訓練資料召回率變為52.3%,而測試資料略降為 43.0%。
    Since Maron proposed the first paper on automatic document classification in 1961, traditionally there are two approaches used: the probability model and the vector space model. Recent research also includes the advanced techniques of statistics, expert systems, natural languages processing, and artificial neural networks to enhance the correctness of document classification. However, all of the aforementioned methods could be regarded as black boxes for automatic document classification, because there are no ways to obtain their classification behaviors or classification rules. This paper uses Quinlan's C4.5 decision trees of machine learning techniques to extract classification rules from automatic documents classification systems. In this research, the classification system of ACM Computing Reviews is based on. Totally 6424 papers, including 56 classes, are collected from it. The title and its source of each paper are used as its document profile. Among the collected papers, 10 % of them are used as test data, and the remaining are used as training data. Totally, there are 1162 classification rules extracted from the training data using Quinlan's decision trees. These extracted classification rules are then used to categorize the training documents and test documents, respectively. The experiment results show that, the recall rates of training data and test data are 67.7% and 45.5%, respectively. If the above rules are further simplified into 290 classification rules, the recall rates of training data and test data become 52.3% and 43.0%, respectively.
    關聯: 第四屆人工智慧與應用研討會論文集,頁160-166
    顯示於類別:[資訊工程學系暨研究所] 會議論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    使用決策樹來抽取文件自動分類系統中之分類規則_中文摘要.docx摘要20KbMicrosoft Word70檢視/開啟
    使用決策樹來抽取文件自動分類系統中之分類規則_英文摘要.docx摘要21KbMicrosoft Word43檢視/開啟

    在機構典藏中所有的資料項目都受到原著作權保護.

    TAIR相關文章

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - 回饋