In real world applications, transactions are far more common to be presented with quantitative data as opposed to binary data. Fuzzy association rule mining algorithms have thus been proposed to handle quantitative transactions. In addition, items have certain life spans or temporal periods in which they exist in a database. Approaches have also been presented to mine fuzzy temporal association rules (FTARs). A key factor in the acquisition of fuzzy rules is the selection of appropriate membership functions. Although many approaches have been designed to generate membership functions, there is currently no existing approach which deals with the problem of generating membership functions for mining FTARs for market basket analysis. In this thesis, we propose two approaches with membership function tuning mechanism to discover FTARs.
In the first approach, it utilizes a clustering method to generate unique membership functions specifically tailored to each item in a data set. Each membership function is based on each individual items’ quantitative range, and the generated membership functions differ not only in terms the values of each interval but also in terms of the number of intervals. Two factors are instrumental in deciding each item''s membership functions; density-similarity among intervals, which corresponds to the similarity in density of intervals, and information closeness within an interval, which corresponds to the similarity in the number of data points between intervals. A parameter θ is used to indicate the importance of the two factors. At last, the derived membership functions are employed in a fuzzy temporal rule mining algorithm to generate association rules. Besides, to speed up the mining process, the Fuzzy FP-growth approach is also utilized in two methods.
Because different θ values will affect greatly the set of membership functions produced, to automatically obtain a suitable parameter θ, the second approach incorporates a genetic algorithm to decide on the optimum value for θ which can produce the largest number of diverse rules. It first uses bit string to encode a possible theta value. The fitness function is made up of a combination of two factors, the number of rules generated and also the diversity of the rules, where diversity of the rules is evaluated by average number of membership functions for items. If a membership function has a larger number of intervals, each interval is smaller so the rules generated are more specific.
Experiments were carried out on one simulated dataset and one real dataset to show the effectiveness of the proposed approaches. For the simulated dataset, the first proposed approach could greatly outperform the previous approach using predefined membership functions in terms of number of rules and diversity of rules. Since the real data set was made up of data with largely differing quantitative values, it can generate a smaller number of rules but the rules related to much more specific fuzzy regions, making them more useful. In relation to the second approach, the genetic algorithm could successfully discover the optimum value for θ in terms of producing the largest number of diverse rules. These rules were used to uncover interesting information from within the datasets.