English  |  正體中文  |  简体中文  |  Items with full text/Total items : 49433/84396 (59%)
Visitors : 7460251      Online Users : 69
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/87989

    Title: 中文意見探勘系統設計
    Other Titles: Design of a Chinese opinion mining system
    Authors: 簡立;Chien, Lee
    Contributors: 淡江大學資訊工程學系碩士班
    蔣璿東;Chiang, Rui-Dong
    Keywords: 意見探勘;冷門詞;熱門詞;排除字;詞庫穩定;Opinion Mining;unpopular words;popular words;exclude words;lexicon stable
    Date: 2012
    Issue Date: 2013-04-13 11:55:34 (UTC+8)
    Abstract: 由於中文文法結構與英文不同,字與字之間是沒有間隔分開來,若使用POS或Parser來找尋意見詞時,會很容易產生錯誤,因此本論文在採用詞庫方式來擷取意見詞同時,搭配著我們提出的排除字方法來改善意見詞擷取的準確率。
    Since the Chinese grammatical structure is different from English, there is no interval space in between Chinese words. Using POS or Parser in search of opinion words can easily lead to errors. Therefore, when capturing opinion words by using the thesaurus (lexicon) way, this study uses the proposed exclusion word method to improve the opinion word capturing precision.
    As each of the different fields has different terminologies or idioms (opinion words and exclusion words), ordinary dictionaries can hardly cover all the opinion words in a specific field. However, for a specific field, as long as the training data are sufficient, most of the opinion words and exclusion words outside the dictionaries can be captured. The opinion words and exclusion words outside the dictionaries that have not been included in the training set are few, and at a stable state. Moreover, they are often opinion words and exclusion words that are not frequently used. This paper uses the experimental data of two different but similar fields of Mobile01 telecommunications.
    As this paper uses the thesaurus/lexicon way to capture the opinion words and exclusion words, all the opinion words and exclusion words in dictionaries can be captured. The opinion words and exclusion words outside the dictionaries can be determined only by manual tagging, which is time and labor consuming. Therefore, according to the stability of the new opinion words and exclusion words outside the dictionaries, this study attempts to design a two-stage lexicon training method to solve this problem. Regarding the proposed two-stage lexicon training method, the first stage is to capture the opinion words or exclusion words of training data by manual semi-automated tagging. The second stage is to directly use the dictionaries to capture the opinion words or exclusion words of the articles when the system is online before manually inspecting the accuracy of the captured opinion words and exclusion words. According to the experimental data, the training procedure of the second stage can save a great deal of time for manual tagging.
    Appears in Collections:[資訊工程學系暨研究所] 學位論文

    Files in This Item:

    File SizeFormat

    All items in 機構典藏 are protected by copyright, with all rights reserved.

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - Feedback