English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 62805/95882 (66%)
造訪人次 : 3946436      線上人數 : 537
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/52389


    題名: 新聞網頁自動分類系統
    其他題名: Automatic classification system of news pages
    作者: 林大澈;Lin, Ta-che
    貢獻者: 淡江大學資訊工程學系碩士班
    陳伯榮;Chen, Po-zung
    關鍵詞: 貝氏分類器;查全率;Naive Bayes Classifier;recall rate
    日期: 2010
    上傳時間: 2010-09-23 17:35:39 (UTC+8)
    摘要: 隨著網際網路的蓬勃發展,網路上存在越來越多的資訊。但是如此迅速的發展也帶來新的問題,就是這樣眾多的新聞及訊息,而每個新聞網站的分類方式也不盡相同,要如何能夠快速的整理並吸收是一個需要面對的問題。
    在本篇論文中建立了一個可自動更新監看新聞網頁並進行自動分類的系統,並將研究重點放在新聞網頁分類的部份。在這篇論文中使用的分類方法主要是以單純貝氏分類器為基礎,在最後計算新聞的分類機率時,同時計算每個單詞的權重,加強分類的準確度,以減少分類時發生同屬於多種分類或無法分類(同屬於全部分類)的情況。
    分類系統先經過訓練模組訓練完960篇新聞後,有了分辨新聞的基礎能力。接著藉由測試200篇新聞後得知這個新聞網頁自動分類系統的平均查全率可以有78%,達到尚可接受的結果。實驗結果顯示將新聞裡的文字資訊藉由詞庫斷詞,當做訓練模組的特徵資料並經過統計詞頻資料後,配合結合了權重概念的貝氏分類方法的新聞分類系統可以有不錯的分類效果。
    There are more and more information in the Internet by the vigorous development of the Internet. But this rapid development has brought about a new problem. That is there are such a large number of news and information, and the classifications of all the news sites are not the same. How to quickly organize the data and absorb them is a need to face.
    In this paper, a classification system is set up through several researches focusing on the news page classification. It can automatically update the news pages and go on automatic classification. This system is based on Naïve Bayes Classifier. When it calculates the probability of news classification, it also calculates the weight of each word at the same time. Thus, it can increase the accuracy of classification and decrease the occurrence of a variety of classifications or not being classified (belong to all categories).
    This classification system has the basic ability to distinguish information after training module has trained 960 news. Afterwards, by testing 200 news, the system’s average recall rate can be 78%, achieving acceptable results. Experimental results show that this system is able to have good performance of classification in the way that the text in the news are divided into broken words by word thesaurus as a feature information of training module, go on to count the word frequency information, combining concept of the weight with Naïve Bayes Classifier.
    顯示於類別:[資訊工程學系暨研究所] 學位論文

    文件中的檔案:

    檔案 大小格式瀏覽次數
    index.html0KbHTML203檢視/開啟

    在機構典藏中所有的資料項目都受到原著作權保護.

    TAIR相關文章

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - 回饋