以此為題，首先提出FAL演算法，掃描資料庫一次，應用向下封閉特性(Downward Closure Property)建立序列晶格，再依據向上封閉特性(Upward Closure Property)刪除晶格中不頻繁序列，建立完全最大序列(Maximal Sequence)晶格，有效縮小晶格所佔用之空間。其次，提出FMCSP演算法，以等效類別(Equivalent Class)的觀念，建立封閉序列(Closed Sequence)晶格。然而，實際運轉的資料庫都會是動態的、時變的，無法避免繼續新增資料的產生。為此，針對時變遞增的資料庫如何維護序列型樣資料模型?!本論文提出MMSP演算法，探討挖掘序列型樣演算法之遞增特性，充分運用已經挖掘之資訊，減少重複挖掘工作。
Since the invention of characters, all kinds of records with numbers and words, increased dramatically in many domains. The invention of computer had trigger rapid data accumulation in science, education, e-Learning, business and supermarket, exponentially. Automation of information system has urged this situation and result in huge amount of data stored in databases worldwide. These electronic data is considered to be the mirror of the real world, which we can make full use it, properly. We try to discover interesting information or knowledge that conceived in databases via various methods, such as statistic, query, graphics and data mining, to further understand the world we lived in. A modern day information system is accountable for this purpose.
In these diverse approaches, data mining has caught the eyes of many domain experts, and gain achievements. In the last decade, mining sequential patterns became one promising topic and arouse our interest. The essence of data mining is to dealing with huge amount of data, many previous researches focus on propose efficient algorithm with less search time and run time. Hence, this dissertation has focused on develop algorithms that mining sequential pattern efficiently, and to be maintainable. In our point of view there are two criteria to evaluate mining algorithm: scan times of database, volume of working space and searching space. As we already known the speed of accessing data from hard drive is slower than from main memory, usually by factor of two. This implies that the less scan times the better. Working space is required to host data during mining process. Searching space is the space to store the result, frequent sequence set or data model. The less space required by algorithm the more chance to fit all processed data into main memory. Consequently, both the efficiency and performance will be improved. With these in mind, all three algorithms been proposed in this dissertation have these three characters: scan database once, mining without candidates and mining full set of frequent sequences.
First algorithm, FAL, is designed to fully utilize both the downward closure property and upward closure property to construct a lattice data model with maximal sequence representation. Second algorithm is FMCSP that inherit the legacy of FAL, but applied closed sequence concept instead of maximal sequence. Note that, closed sequence is the longest sequence in its equivalent class, it can shrink the size of searching space, and furthermore, with adjustable ability for user to set, or tune, the threshold of minimum support after the mining of data model had been constructed. Finally, algorithm MMSP has inherited the legacy of previous algorithms to deal with incremental sequence database. MMSP is capable to handle incremental data added into data model one by one and batch data without rerun whole database from scratch.