中文新聞自動摘要產生系統

淡江大學機構典藏 > 工學院 > 資訊工程學系暨研究所 > 學位論文 > Item 987654321/114662

請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/114662

題名:	中文新聞自動摘要產生系統
其他題名:	Automatically generate abstract for Chinese news
作者:	莊秉哲;Chuang, Ping-Che
貢獻者:	淡江大學資訊工程學系碩士班徐郁輝;Shyu, Yuh-Huei
關鍵詞:	自動摘要;中文斷詞;網路新聞;資訊檢索;大數據;Automatic Abstract;Chinese Word Segmentation;Network News;Information Retrieval;Big Data
日期:	2017
上傳時間:	2018-08-03 14:59:58 (UTC+8)
摘要:	隨著網際網路的蓬勃發展，瀏覽新聞媒體網站、線上閱讀新聞已成為許多民眾上網的主要活動，但每天都有大量的新聞資料產生，已經造成資料氾濫的情形。讀者通常只會選擇重要或感興趣的新聞閱讀，其他新聞至多只會看看標題就帶過去了。這些被草草帶過的新聞裡面或許會有讀者想知道的資訊，但可能會因為標題下的不夠好而沒有被讀者閱讀。將不同新聞網站的文章保存，並從冗長的文章自動概括出簡潔的摘要，就可以為讀者節省大量的閱讀時間。本論文提出一個能自動收集並歸納出中文新聞摘要的方法，其步驟是先把網站上的新聞標題、類別和內文擷取下來，再利用中文斷詞技術以自行定義的詞彙資料庫為基準來進行分詞斷句，然後使用資訊檢索的加權技術來找出文章中的專有名詞和關鍵字，並以句子為單位，算出句子的權重。接著以文章標題的詞彙為指標，找出句子的顯要因素值。最後將兩者進行加總算出新的句子權重值，即可進行重要句子擷取的作業，依照權重值的大小按照文章順序來對句子做排序，以產生中文新聞自動摘要。 As the development of the internet grows rapidly, browsing news media website and online news have been the main activity for most people. Furthermore, news release everyday massively, which causes the overflowing of information. Readers generally read the headlines or the topics which they are interested in. They would only read the title of other news at most. Those news ignored by readers at first glance might contain some information that readers want to know; however, the titles might be unappealing for public therefore the articles are not read. In summarize, if the articles from different news media are saved, and the brief summaries are automatic abstracted, it would be possible to gain more time for readers efficiently. This paper put forward a method can collect and generalize Chinese news abstract automatically. The steps are capturing the news title, category, and content on the internet, and using Chinese word segmentation technique to segment the words by standard from lexical database which is self-defined. Furthermore, using weighted technique for information retrieval to find proper names and keywords; by unit of sentence, calculates the weight of each sentence. Moreover,find the significance factor by using the title of the article as an index. Finally, summarize both of them to get the new sentence weight to continue the retrieve of the key sentence. According to the weight of sentence and the order in the article, an abstract of Chinese news is generated automatically.
顯示於類別:	[資訊工程學系暨研究所] 學位論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	167	檢視/開啟

在機構典藏中所有的資料項目都受到原著作權保護.

TAIR相關文章

資料載入中.....