English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 49633/84879 (58%)
造訪人次 : 7693679      線上人數 : 54
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    請使用永久網址來引用或連結此文件: http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/34098


    題名: 運用重複句排除技術於中文文件自動摘要之研究
    其他題名: Elimination of duplicate sentences in automatic summarization of Chinese documents
    作者: 陳姿妤;Chen, Tzu-yu
    貢獻者: 淡江大學資訊管理學系碩士班
    魏世杰;Wei, Shih-chieh
    關鍵詞: 自動摘要;TFIDF;相似度;Hownet;重複句排除;Automatic Summarization;TFIDF;Similarity Measure;Duplicate Sentences
    日期: 2007
    上傳時間: 2010-01-11 04:55:08 (UTC+8)
    摘要: 本研究針對中文文件,以節錄的方式自原文中摘要出重要的句子集合。在擷取重要句子的作法上,一般是利用特徵選取的方式來抽取文章中心概念,如以TFIDF法計算詞彙、句子權重;或以考量特殊關鍵詞、提示字、句子位置等指標作為句子重要度評斷的依據。
    本研究假設作者於文章寫作時,經常會重複提及欲論述的主題,這些意思相近的句子便容易在抽取文章中心概念的過程中形成高得分的句子集合。因此,本研究希望藉由比對兩句子之間的相似度,過濾摘要結果中資訊重複的句子。在句子相似度的計算上,除了做詞彙共同出現的布林比對外,也希望能進一步考量同義詞的比對,因此,我們引入中文詞語義知識庫「知網」,透過知網中對詞彙的語義定義,來進行同義詞的相似度計算。
    實驗結果發現,在擷取文中重要句子的作法上,使用TFIDF為基礎的詞彙權重計算,結合句子與文章標題句之間的相似度特徵,可提升摘要結果的平均精確度約7%。於摘要結果中,利用Jaccard相似度,結合Hownet的同義詞觀念,以排除摘要重複句,亦可達到提升摘要精確度的效果。
    This is a research on automatic summarization of Chinese documents. We try to extract important sentences from documents based on such sentence features as sum of TFIDF weights in a sentence or the location of the sentence in a document.
    We assume that the important sentences thus extracted might still contain redundant information as authors tend to repeat their main ideas several times in documents. This redundancy would preclude the inclusion of other important sentences under a given summary compression rate. To solve this problem, we propose a sentence similarity measure to filter out duplicate sentences in a summary. Our proposed similarity measure takes into account the co-occurrence of exact and synonym words in two sentences. To compute the similarity of synonym words, Hownet, a Chinese equivalent of English lexical database WordNet, is introduced and implemented.
    The result shows that a combined sentence feature using sum of TFIDF weights as well as similarity with the title sentence can improve the precision by 7%. For elimination of duplicate sentences, a Jaccard- and Hownet-based similarity measure can also give an improved precision in the automatic summarization results.
    顯示於類別:[資訊管理學系暨研究所] 學位論文

    文件中的檔案:

    檔案 大小格式瀏覽次數
    0KbUnknown185檢視/開啟

    在機構典藏中所有的資料項目都受到原著作權保護.

    TAIR相關文章

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - 回饋