CHSPM:一個完整的混合循序樣式探勘演算法

淡江大學機構典藏 > 商管學院 > 資訊管理學系暨研究所 > 學位論文 > Item 987654321/34130

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/34130

Title:	CHSPM:一個完整的混合循序樣式探勘演算法
Other Titles:	Chspm: a complete hybrid sequential patterns mining algorithm
Authors:	原孝任;Yuan, Hsiau-ren
Contributors:	淡江大學資訊管理學系碩士班周清江;Jou, Chichang
Date:	2005
Issue Date:	2010-01-11 04:57:17 (UTC+8)
Abstract:	現有循序樣式探勘的研究依照樣式中相連的項目是否必須在交易紀錄中緊密相連可粗略的分為以下三類，第一類為找出連續循序樣式；第二類為找出非連續循序樣式；第三類為找出混合循序樣式。過去混合循序樣式探勘的演算法都以Apriori為基礎，但這些方法探勘出的結果並不完整，所以我們針對混合循序樣式探勘，以樣式成長(pattern-growth)方法為基礎，提出一個新的演算法CHSPM(A Complete Hybrid Sequential Patterns Mining Algorithm)，以窮舉法來找出完整之混合循序樣式。 CHSPM演算法有以下四個步驟，分別為：1.產生增補一階頻繁樣式；2. 縮減資料庫；3. 分割資料庫，建立投影資料庫；4. 探勘投影資料庫，建立子投影資料庫，直到找出所有的混合循序樣式。為了驗證CHSPM的探勘結果，我們使用10萬至30萬筆的模擬資料來進行實驗，並與過去探勘混合循序樣式效率最佳的GFP2 演算法比較。實驗結果顯示，雖然CHSPM在效能上不如GFP2，但可以探勘出完整的混合循序樣式。 Based on whether consecutive items in sequential patterns should also be consecutive in the transactions, existing researches about sequential pattern mining could be classified into the following three categories: The first is to find continuous patterns; the second is to find discontinuous patterns; the third is to find hybrid patterns that combine both continuous patterns and discontinuous patterns. Previous hybrid sequential pattern mining algorithms were all based on the Apriori algorithm, but we discovered that their mining results are incomplete. Thus, based on the pattern-growth method, we propose a new algorithm (CHSPM) to find complete hybrid sequential patterns. The four steps of CHSPM are as follows: 1. Build the supplemented frequent-1-sequence item set; 2. Reduce the database by erasing unimportant items from the transactions. 3. Partition the database, and build projected databases. 4. Recursively mine the projected databases and build sub-projected databases until all hybrid sequential patterns are found. Finally, we use synthetic databases of 100,000 to 300,000 records to test our algorithm, and to compare our results with those of GFP2, the most efficient algorithm in hybrid sequential pattern mining up to now. The result shows that even though CHSPM is slower than GFP2, it can find out complete hybrid sequential patterns.
Appears in Collections:	[資訊管理學系暨研究所] 學位論文

Files in This Item:

File	Size	Format
	0Kb	Unknown	564	View/Open

数据加载中.....