淡江大學機構典藏:Item 987654321/95864
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 64178/96951 (66%)
Visitors : 9543448      Online Users : 15618
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/95864


    Title: 使用決策樹來抽取文件自動分類系統中之分類規則 Extracting Classification Rules in Automatic Document Classification Systems by Using Decision Trees
    Authors: 洪文斌
    Contributors: 淡江大學資訊工程學系
    Keywords: 自動文件分類;決策樹;資訊檢索;機器學習;Automatic Document Classification;Decision Tree;Information Retrieval;Machine Learning
    Date: 1999-11
    Issue Date: 2014-02-13
    Abstract: 自從Maron於1961年提出首篇文件自動分類的論文以來,傳統的分類方法不外乎機率模式與向量模式。近年來的研究也加入了統計分析、專家系統、自然語言處理、和類神經網路等先進的技術,以提高分類的正確性。以上所提的諸方法中,其對文件自動分類而言,均可視為是黑箱作業,因其分類行為或分類規則無從得知。本研究利用機械學習技術中之Quinlan的C4.5決策樹(Decision trees)來抽取文件自動分類系統中之分類規則,期使文件自動分類系統之分類行為透明化,而人們可藉由所抽取之分類規則進一步來驗證文件自動分類之正確性。在本研究中,我們採用ACM Computing Reviews的分類法作為分類的依據。我們從該期刊共收錄了56個中類別,6424篇論文為實驗用資料。再以其中的論文題目和出處當作該文件的素描(Profile)。取其中十分之一作為測試資料,其餘為訓練資料。我們從訓練資料中,使用 Quinlan的決策樹共抽取出1162條分類規則。再利用此分類規則分別對訓練文件及測試文件做分類,實驗結果分別為:訓練資料召回率為67.7%,測試資料為 45.5%。若將上述規則再精簡成 29O條分類規則,則訓練資料召回率變為52.3%,而測試資料略降為 43.0%。
    Since Maron proposed the first paper on automatic document classification in 1961, traditionally there are two approaches used: the probability model and the vector space model. Recent research also includes the advanced techniques of statistics, expert systems, natural languages processing, and artificial neural networks to enhance the correctness of document classification. However, all of the aforementioned methods could be regarded as black boxes for automatic document classification, because there are no ways to obtain their classification behaviors or classification rules. This paper uses Quinlan's C4.5 decision trees of machine learning techniques to extract classification rules from automatic documents classification systems. In this research, the classification system of ACM Computing Reviews is based on. Totally 6424 papers, including 56 classes, are collected from it. The title and its source of each paper are used as its document profile. Among the collected papers, 10 % of them are used as test data, and the remaining are used as training data. Totally, there are 1162 classification rules extracted from the training data using Quinlan's decision trees. These extracted classification rules are then used to categorize the training documents and test documents, respectively. The experiment results show that, the recall rates of training data and test data are 67.7% and 45.5%, respectively. If the above rules are further simplified into 290 classification rules, the recall rates of training data and test data become 52.3% and 43.0%, respectively.
    Relation: 第四屆人工智慧與應用研討會論文集,頁160-166
    Appears in Collections:[Graduate Institute & Department of Computer Science and Information Engineering] Proceeding

    Files in This Item:

    File Description SizeFormat
    使用決策樹來抽取文件自動分類系統中之分類規則_中文摘要.docx摘要20KbMicrosoft Word131View/Open
    使用決策樹來抽取文件自動分類系統中之分類規則_英文摘要.docx摘要21KbMicrosoft Word87View/Open

    All items in 機構典藏 are protected by copyright, with all rights reserved.


    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - Feedback