English  |  正體中文  |  简体中文  |  Items with full text/Total items : 52047/87178 (60%)
Visitors : 8716867      Online Users : 94
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/34098

    Title: 運用重複句排除技術於中文文件自動摘要之研究
    Other Titles: Elimination of duplicate sentences in automatic summarization of Chinese documents
    Authors: 陳姿妤;Chen, Tzu-yu
    Contributors: 淡江大學資訊管理學系碩士班
    魏世杰;Wei, Shih-chieh
    Keywords: 自動摘要;TFIDF;相似度;Hownet;重複句排除;Automatic Summarization;TFIDF;Similarity Measure;Duplicate Sentences
    Date: 2007
    Issue Date: 2010-01-11 04:55:08 (UTC+8)
    Abstract: 本研究針對中文文件,以節錄的方式自原文中摘要出重要的句子集合。在擷取重要句子的作法上,一般是利用特徵選取的方式來抽取文章中心概念,如以TFIDF法計算詞彙、句子權重;或以考量特殊關鍵詞、提示字、句子位置等指標作為句子重要度評斷的依據。
    This is a research on automatic summarization of Chinese documents. We try to extract important sentences from documents based on such sentence features as sum of TFIDF weights in a sentence or the location of the sentence in a document.
    We assume that the important sentences thus extracted might still contain redundant information as authors tend to repeat their main ideas several times in documents. This redundancy would preclude the inclusion of other important sentences under a given summary compression rate. To solve this problem, we propose a sentence similarity measure to filter out duplicate sentences in a summary. Our proposed similarity measure takes into account the co-occurrence of exact and synonym words in two sentences. To compute the similarity of synonym words, Hownet, a Chinese equivalent of English lexical database WordNet, is introduced and implemented.
    The result shows that a combined sentence feature using sum of TFIDF weights as well as similarity with the title sentence can improve the precision by 7%. For elimination of duplicate sentences, a Jaccard- and Hownet-based similarity measure can also give an improved precision in the automatic summarization results.
    Appears in Collections:[資訊管理學系暨研究所] 學位論文

    Files in This Item:

    File SizeFormat

    All items in 機構典藏 are protected by copyright, with all rights reserved.

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - Feedback