以正規表示式萃取研討會資訊之研究

淡江大學機構典藏 > 商管學院 > 資訊管理學系暨研究所 > 學位論文 > Item 987654321/74412

Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/74412

Title:	以正規表示式萃取研討會資訊之研究
Other Titles:	Research on extraction of conference information based on regular expressions
Authors:	陳康毅;Chen, Kang-Yi
Contributors:	淡江大學資訊管理學系碩士班魏世杰;Wei, Shih-Chieh
Keywords:	資訊萃取;研討會資訊;正規表示式;information extraction;conference information;Regular Expression
Date:	2011
Issue Date:	2011-12-28 18:34:33 (UTC+8)
Abstract:	一般研究者在投稿研討會時，常面臨適合自己主題的研討會還有哪些尚未截稿的問題。另外，在經費及時間考量下，也要確定會議地點及開會時間是否適合自己的情況。雖然目前研討會資訊可以從搜索引擎查詢而得，但是研究者仍需在輸入關鍵字之後，一筆一筆檢視查詢結果網頁，以找尋最符合自己需求的研討會，這相當的花費時間。本文目的在實作一個研討會輔助查詢系統，能從使用者輸入的主題關鍵詞，透過搜索引擎找回描述相關研討會的網頁。然後以正規表示式分析網頁文字前後的關係，從中萃取一般投稿者關心的研討會名稱，主題，截稿日期，開會日期，開會地點，及研討會網址共六項資訊。本文也建立一個圖形使用者介面，讓使用者能藉由簡單的查詢句輸入，將每一筆回傳網頁萃取到的六項資訊彙整在表格中，供使用者查閱。使用者可點選資料列顯示原始網頁純文字標記內容，或是開啟瀏覽器連結原始網頁的頁面，確認六項萃取資訊正確與否。同時系統允許投稿者依各欄位作排序，而整理出最適合自己投稿，排好順位的幾個研討會，供匯出或列印之用。希望透過本系統，研究者能方便找到適合自己條件的研討會，有效減少逐一開啟網頁檢視的時間。最後本文也針對Google搜尋引擎下的搜尋結果進行萃取資訊正確性的評估。實驗分析顯示本系統萃取的研討會六項資訊皆有不錯的正確性，可供使用者有效參考使用。 When finding conferences for paper submission, researchers often have to find those conferences whose submission deadlines are not due yet. Furthermore researchers have to make sure that the conference''s location and date fit their specific cost and time constraints. Though the conference information can be looked up from search engines, users still has to spend a lot of time filtering each returned page laboriously to find the conferences meeting their needs. This work aims to implement a query system which can help users find their desired conferences easily. They just need to input the topic of the paper in keywords. The system will send the keywords to the search engine and fetch the hitting conference web pages for information extraction. Based on regular expressions, the system will analyze the text in a web page and extract the six items of desired information which include the conference''s title, topics, submission deadline, conference date, location, and the url. A graphic user interface is provided which allows the user to input the topic keywords and browse the returned conferences in a table. Each row in the table summarizes the extracted information of a conference in six fields. For verification, the user can click the conference to see the web page in a pure text or rendered format. The system also allows field sorting or hand moving to edit a table of desired conferences for exporting or printing use. With the system in this work, the researchers will benefit a lot in finding conferences fitting their needs more efficiently. At the end, evaluation of the extracted information based on the output of the Google search engine is also conducted. The experiment shows that the six items of conference information extracted by the system are good for use in terms of the precision and recall performances.
Appears in Collections:	[資訊管理學系暨研究所] 學位論文

Files in This Item:

File	Size	Format
index.html	0Kb	HTML	422	View/Open

Loading...