淡江大學機構典藏:Item 987654321/74585
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 62805/95882 (66%)
Visitors : 3989220      Online Users : 598
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/74585


    Title: 基於中文斷詞技術之新聞網頁分類系統
    Other Titles: Automatic news pages classification system based on chinese word segmentation
    Authors: 林孟翰;Lin, Meng-Han
    Contributors: 淡江大學資訊工程學系碩士班
    蔡憶佳;Tsai, Yih-Jia
    Keywords: 貝氏分類法;查全率;Naive Bayes Classifier;recall rate
    Date: 2011
    Issue Date: 2011-12-28 18:57:49 (UTC+8)
    Abstract: 近年來隨著網路的發展,網路已經是人們生活中不可缺少的一部份,利用網路的便利性與互動性,可以使網路使用者知道近期內所發生的事情,也因為網路擁有這些特性,使得新聞資訊成長非常的快速。然而這樣的狀況衍生了一個問題,如何讓網路使用者能夠得知正確或是相關的訊息則是當下不得不面對的重要問題。
    在本論文中建立了一個以基於中文斷詞技術的新聞網頁分類系統,把網路上所擷取的文章,利用統計式斷詞法來計算出各種詞在文章中出現的次數,然後設定一個門檻值,若是統計過次數的詞未超過系統所設定的門檻值,則將該詞從詞庫中刪除。接著把符合的詞配合單純貝氏分類與結合權重的貝氏分類兩種分類方法來比較哪一種分類方法較佳。
    實驗結果顯示,利用單純貝氏分類的分類結果比結合權重的貝氏分類的分類結果還要好,分類的查全率最高可達71%。從結果來看,利用門檻值的設定將不正確的詞刪除,配合單純貝氏分類法來做分類具有不錯的效果。
    With the vigorous development of the Internet, network is becoming indispensable to many people’s everyday life. Due to the convenience of reading news from the network, the number of users learning recent events from the Internet is growing rapidly. This also caused a large number of news agencies made their news available on the network. Thus, how to enable users receive relevant or interested news is an important issue. One way is to build an automatic news classification system that allows users to read from different categories of their interests.
    In this paper, a news page classification system based on Chinese word segmentation is set up. It can automatically download news pages and use the n-gram algorithm for word segmentation. After word segmentation, we compare the performance of two classification schemes. Naïve Bayes classifier has higher recall rate, average recall rate is 71%. Experimental results show that Naïve Bayes classifier with n-gram for word segmentation has a better performance over.
    Appears in Collections:[Graduate Institute & Department of Computer Science and Information Engineering] Thesis

    Files in This Item:

    File SizeFormat
    index.html0KbHTML237View/Open

    All items in 機構典藏 are protected by copyright, with all rights reserved.


    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - Feedback