Fuzzy temporal association rule acquisition technique with membership function tuning mechanism

淡江大學機構典藏 > 工學院 > 資訊工程學系暨研究所 > 學位論文 > Item 987654321/114717

請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/114717

題名:	Fuzzy temporal association rule acquisition technique with membership function tuning mechanism
其他題名:	具隸屬函數調整機制的模糊時序關聯規則萃取技術
作者:	周翔;Chou, Hsiang
貢獻者:	淡江大學資訊工程學系全英語碩士班陳俊豪
關鍵詞:	Clustering;fuzzy temporal association rule;Genetic Algorithm;item lifespan;membership functions;分群技術;商品上架時間;模糊時序關聯規則;遺傳演算法;隸屬函數
日期:	2017
上傳時間:	2018-08-03 15:01:43 (UTC+8)
摘要:	在實際應用上，相較於二元交易資料，數值型交易資料是更常見的。為處理數值型交易資料，模糊關聯規則探勘演算法因應而生。此外，交易資料中的商品是有存活期間或上架的區間的，故許多演算法亦被提出並用以探勘模糊時序關聯規則。在模糊規則萃取過程中，其關鍵成功因素是如何選擇適合的隸屬函數。雖然文獻中已有不少方法可用於產生適當的隸屬函數，但目前並無文獻闡述如何產生探勘模糊時序關聯規則所需的隸屬函數。故本論文提出兩個具有隸屬函數調整機制的探勘方法用來挖掘模糊時序關聯規則。在第一個方法，它首先利用分群技術與交易資料集產生每個商品的專屬隸屬函數。隸屬函數的產生主要是根據商品的數值值域，故每個商品的專屬隸屬函數在數值區間與函數數量上不盡相同。具體來說是透過兩因子制定商品的隸屬函數，即區間密度相似與區間資訊相似因子。兩因子分別用於評估區間密度與資料量是否相似。所採用的分群技術使用theta參數值決定兩因子的重要程度。最後，所產生之商品專屬隸屬函數則用於探勘模糊時序關聯規則。此外，本論文亦開發Fuzzy FP-growth方法用提升探勘效率。因不同的theta參數值對商品的專屬隸屬函數產生過程有很大的影響，為自動找出theta參數值，第二個方法結合遺傳演算法旨在找出可產生最多且多樣的規則的theta值。首先，它利用位元字串表示每一可能的theta值。適合度函數則由規則數量與規則多樣性組成，其中規則多樣性是由商品的平均隸屬函數個數進行評估。如商品有較多的隸屬函數則表示每個隸屬函數的值域是較小的亦表示可挖掘出較特殊的規則。實驗部分透過模擬與真實交易資料驗證所提方法的有效性。模擬交易資料集的結果顯示第一個方法在規則數量與規則多樣性較現存的方法優異。在真實交易資料集中，因商品購買數量差異相當大，故所提方法可探勘出的規則數量雖較現存方法少，但仍可得到較多樣且有用的規則。在第二個方法中，實驗結果顯示它可找出合適的theta值用以產生數量較多且多樣的規則並可用於揭露交易資料中令人感興趣的資訊。 In real world applications, transactions are far more common to be presented with quantitative data as opposed to binary data. Fuzzy association rule mining algorithms have thus been proposed to handle quantitative transactions. In addition, items have certain life spans or temporal periods in which they exist in a database. Approaches have also been presented to mine fuzzy temporal association rules (FTARs). A key factor in the acquisition of fuzzy rules is the selection of appropriate membership functions. Although many approaches have been designed to generate membership functions, there is currently no existing approach which deals with the problem of generating membership functions for mining FTARs for market basket analysis. In this thesis, we propose two approaches with membership function tuning mechanism to discover FTARs. In the first approach, it utilizes a clustering method to generate unique membership functions specifically tailored to each item in a data set. Each membership function is based on each individual items’ quantitative range, and the generated membership functions differ not only in terms the values of each interval but also in terms of the number of intervals. Two factors are instrumental in deciding each item''s membership functions; density-similarity among intervals, which corresponds to the similarity in density of intervals, and information closeness within an interval, which corresponds to the similarity in the number of data points between intervals. A parameter θ is used to indicate the importance of the two factors. At last, the derived membership functions are employed in a fuzzy temporal rule mining algorithm to generate association rules. Besides, to speed up the mining process, the Fuzzy FP-growth approach is also utilized in two methods. Because different θ values will affect greatly the set of membership functions produced, to automatically obtain a suitable parameter θ, the second approach incorporates a genetic algorithm to decide on the optimum value for θ which can produce the largest number of diverse rules. It first uses bit string to encode a possible theta value. The fitness function is made up of a combination of two factors, the number of rules generated and also the diversity of the rules, where diversity of the rules is evaluated by average number of membership functions for items. If a membership function has a larger number of intervals, each interval is smaller so the rules generated are more specific. Experiments were carried out on one simulated dataset and one real dataset to show the effectiveness of the proposed approaches. For the simulated dataset, the first proposed approach could greatly outperform the previous approach using predefined membership functions in terms of number of rules and diversity of rules. Since the real data set was made up of data with largely differing quantitative values, it can generate a smaller number of rules but the rules related to much more specific fuzzy regions, making them more useful. In relation to the second approach, the genetic algorithm could successfully discover the optimum value for θ in terms of producing the largest number of diverse rules. These rules were used to uncover interesting information from within the datasets.
顯示於類別:	[資訊工程學系暨研究所] 學位論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	182	檢視/開啟

在機構典藏中所有的資料項目都受到原著作權保護.

TAIR相關文章

資料載入中.....