在系統的實做的過程中,我們比對了未做索引、單字字元切割完後的索引 與無前置後置集的索引的數量與搜尋的時間,經過了實驗數據的分析與探討,充分驗證了當我們做完了無前置後置集後的索引 對於降低索引 的個數有著相當大的幫助。因此,當索引 數量降低,正規語言表示 比對所要花的時間相對的也降低了。在此一電影檢索系統中,單字字元的建置便顯的相當的重要,此也是本論文對於搜尋大量資料的索引建置的主要貢獻。 This paper will discuss how to build a movie retrieval system which can search English Grammar. English Teachers can design the teaching materials by this system. The teaching materials can provide some grammar examples which are used in daily life for students to learn. To achieve searching the English grammar in the movies, the movie subtitles will be processed before user’s query. For example, the movie subtitles will be processed by POS tagging、Lemmatizatize,and the information of POS tagging and Lemmatization will be saved to be XML Format. To provide a movie clip with the syntax result, our system also detects movie scene change which is implemented by the image similarity.
When we use the regular expression as the query language, it will cost much time to match pattern. Therefore, we build the index of the movie subtitles to reduce the searching time. About the index construction, we use the k-gram indexing to be our approach which contains k-gram indexing、Useful index and Presuf-free set。Besides, we use the similarity of two continuous frames to detect the scene change.
To test the actually system, we compare the searching time and the number of syntax result which is searched by the full、complete and the presuf-free indices. After examining and analyzing the results, we concluded through expand by sense, we could reduce the number of the indices and the searching time by constructing the k-gram indexing.. In this paper, we show how to construct the k-gram indexing before users search has a concrete contribution to the area of large database systems