English  |  正體中文  |  简体中文  |  Items with full text/Total items : 52359/87459 (60%)
Visitors : 9141737      Online Users : 247
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/87962

    Title: MOBILE01和PTT兩個不同論壇相同面向習慣用詞之探討 : 以電信, 寬頻版為例
    Other Titles: MOBILE01 and PTT two different forums discussion on the same face habit words : a case study of telecommunications, broadband
    Authors: 瞿怡正;Chu, Yi-Cheng
    Contributors: 淡江大學資訊工程學系碩士在職專班
    Keywords: 初始詞庫;人工標註;Initial Lexicon;Manual Tagging
    Date: 2012
    Issue Date: 2013-04-13 11:54:15 (UTC+8)
    Abstract: 本論文將針對兩種不同類型的論壇做比較,對相同領域而言,觀察兩種不同類型論壇詞典外的習慣用語是否相同?進而討論(1)利用其中一個較專業和發表與特定領域相關文章較多之論壇所訓練完成的詞庫當作另一論壇的初始詞庫是否有助於準確率和回收率的提升?(2)有了初始詞庫,為節省大量人力,是否可跳過第一階段人工標註的動作,直接利用系統第二階段的動作直接擷取文章中的意見詞或排除字?本論文會利用PTT與Mobile01的電信版和寬頻版來討論上述問題。依據實驗數據顯示,如果使用Mobile01排除字詞庫當初始詞庫,確實可以改善系統的準確率和回收率;但就非冷門的詞庫外意見詞使用習慣而言,PTT和Mobile01使用者的習慣用語仍有些許不同且數量不多,所以即便是使用Mobile01訓練完成的詞庫當作PTT初始詞庫,為了維護較高的準確率和回收率,不建議跳過第一階段人工標註的動作;但如果為了節省人力可以跳過第一階段人工標註的動作,直接利用系統第二階段的動作直接擷取文章中的意見詞或排除字。
    This paper compares the forums of two different types to observe whether the idioms/terminologies outside the dictionaries of the two different types of forums are the same for the same field before further discussions (1) whether it is helpful in improving precision and recall to use the trained lexicon of a more professional forum with more published articles relating to a specific field as the initial lexicon of another forum. (2)Using the initial lexicon, this paper attempts to discuss whether the manual tagging operation at the first stage can be skipped to directly capture the opinion words or exclusion words of the articles by using the system operations of the second stage. This paper discusses the above problems by using PTT and Mobile01 telecommunications page and broadband page. According to experimental data, using Mobile01 exclusion words lexicon as the initial lexicon can improve the system precision and recall. However, for the usage of unpopular opinion words outside the lexicon, idioms/phrases of PTT and Mobile01 users may slightly differ in small number. Therefore, even if using the Mobile01 trained lexicon as the PTT initial lexicon, to keep relatively high level of precision and recall, it is not recommended to skip the manual tagging operation of the first stage. However, in order to save manpower, the manual tagging operation of the first stage can be skipped to directly use the system operations of the second stage to directly capture opinion words or exclusion words from the article.
    Appears in Collections:[Graduate Institute & Department of Computer Science and Information Engineering] Thesis

    Files in This Item:

    File SizeFormat

    All items in 機構典藏 are protected by copyright, with all rights reserved.

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - Feedback