應用語法搜尋於電影採礦之設計

淡江大學機構典藏 > 工學院 > 資訊工程學系暨研究所 > 學位論文 > Item 987654321/35004

請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/35004

題名:	應用語法搜尋於電影採礦之設計
其他題名:	The designing of a syntax-based retrieval system for mining movies
作者:	張振富;Chang, Chen-fu
貢獻者:	淡江大學資訊工程學系碩士班郭經華;Kuo, Chin-hwa
日期:	2005
上傳時間:	2010-01-11 05:53:25 (UTC+8)
摘要:	本系統最主要是提供一個可查詢語法的電影檢索系統。英文老師可以利用此系統來編制教材，提供給學生學習日常生活中常會用到的一些語法。為了提供語法查詢的功能我們必須先將電影字幕做一些前處理，例如：將字幕做詞性加註、詞性還原且將詞性加註和詞性還原後的資訊存成可擴充標示語言格式提供正規語言表示比對。為了提供一個完整包含語法搜尋結果的電影片段，系統也利用了一個簡單的圖片相似度的方法來實做場景偵測。當我們利用正規語言表示來當作我們的查詢語言，正規語言表示比對將會耗費相當多的時間。因此，我們將電影字幕建置索引來降低正規語言表示所要比對的句子個數。關於索引建置，我們是利用單字字元的索引建置方法，此方法最主要包含了單字字元切割、有效索引與無前置後置集。此外，電影場景偵測部分，我們利用了連續兩張圖片的相似度來判斷是否有場景變化的發生。在系統的實做的過程中，我們比對了未做索引、單字字元切割完後的索引與無前置後置集的索引的數量與搜尋的時間，經過了實驗數據的分析與探討，充分驗證了當我們做完了無前置後置集後的索引對於降低索引的個數有著相當大的幫助。因此，當索引數量降低，正規語言表示比對所要花的時間相對的也降低了。在此一電影檢索系統中，單字字元的建置便顯的相當的重要，此也是本論文對於搜尋大量資料的索引建置的主要貢獻。 This paper will discuss how to build a movie retrieval system which can search English Grammar. English Teachers can design the teaching materials by this system. The teaching materials can provide some grammar examples which are used in daily life for students to learn. To achieve searching the English grammar in the movies, the movie subtitles will be processed before user’s query. For example, the movie subtitles will be processed by POS tagging、Lemmatizatize，and the information of POS tagging and Lemmatization will be saved to be XML Format. To provide a movie clip with the syntax result, our system also detects movie scene change which is implemented by the image similarity. When we use the regular expression as the query language, it will cost much time to match pattern. Therefore, we build the index of the movie subtitles to reduce the searching time. About the index construction, we use the k-gram indexing to be our approach which contains k-gram indexing、Useful index and Presuf-free set。Besides, we use the similarity of two continuous frames to detect the scene change. To test the actually system, we compare the searching time and the number of syntax result which is searched by the full、complete and the presuf-free indices. After examining and analyzing the results, we concluded through expand by sense, we could reduce the number of the indices and the searching time by constructing the k-gram indexing.. In this paper, we show how to construct the k-gram indexing before users search has a concrete contribution to the area of large database systems
顯示於類別:	[資訊工程學系暨研究所] 學位論文

文件中的檔案:

檔案	大小	格式	瀏覽次數
	0Kb	Unknown	397	檢視/開啟

在機構典藏中所有的資料項目都受到原著作權保護.

TAIR相關文章

資料載入中.....