English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 62822/95882 (66%)
造访人次 : 4028125      在线人数 : 571
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/114662


    题名: 中文新聞自動摘要產生系統
    其它题名: Automatically generate abstract for Chinese news
    作者: 莊秉哲;Chuang, Ping-Che
    贡献者: 淡江大學資訊工程學系碩士班
    徐郁輝;Shyu, Yuh-Huei
    关键词: 自動摘要;中文斷詞;網路新聞;資訊檢索;大數據;Automatic Abstract;Chinese Word Segmentation;Network News;Information Retrieval;Big Data
    日期: 2017
    上传时间: 2018-08-03 14:59:58 (UTC+8)
    摘要: 隨著網際網路的蓬勃發展,瀏覽新聞媒體網站、線上閱讀新聞已成為許多民眾上網的主要活動,但每天都有大量的新聞資料產生,已經造成資料氾濫的情形。讀者通常只會選擇重要或感興趣的新聞閱讀,其他新聞至多只會看看標題就帶過去了。這些被草草帶過的新聞裡面或許會有讀者想知道的資訊,但可能會因為標題下的不夠好而沒有被讀者閱讀。將不同新聞網站的文章保存,並從冗長的文章自動概括出簡潔的摘要,就可以為讀者節省大量的閱讀時間。
    本論文提出一個能自動收集並歸納出中文新聞摘要的方法,其步驟是先把網站上的新聞標題、類別和內文擷取下來,再利用中文斷詞技術以自行定義的詞彙資料庫為基準來進行分詞斷句,然後使用資訊檢索的加權技術來找出文章中的專有名詞和關鍵字,並以句子為單位,算出句子的權重。接著以文章標題的詞彙為指標,找出句子的顯要因素值。最後將兩者進行加總算出新的句子權重值,即可進行重要句子擷取的作業,依照權重值的大小按照文章順序來對句子做排序,以產生中文新聞自動摘要。
    As the development of the internet grows rapidly, browsing news media website and online news have been the main activity for most people. Furthermore, news release everyday massively, which causes the overflowing of information. Readers generally read the headlines or the topics which they are interested in. They would only read the title of other news at most. Those news ignored by readers at first glance might contain some information that readers want
    to know; however, the titles might be unappealing for public therefore the articles are not read. In summarize, if the articles from different news media are saved, and the brief summaries are automatic abstracted, it would be possible to gain more time for readers efficiently.

    This paper put forward a method can collect and generalize Chinese news abstract automatically. The steps are capturing the news title, category, and content on the internet, and using Chinese word segmentation technique to segment the words by standard from lexical database which is self-defined. Furthermore, using weighted technique for information retrieval to find proper names and keywords; by unit of sentence, calculates the weight of each sentence. Moreover,find the significance factor by using the title of the article as an index. Finally, summarize both of them to get the new sentence weight to continue the retrieve of the key sentence. According to the weight of sentence and the order in the article, an abstract of Chinese news is generated automatically.
    显示于类别:[資訊工程學系暨研究所] 學位論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML128检视/开启

    在機構典藏中所有的数据项都受到原著作权保护.

    TAIR相关文章

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - 回馈