Zhang, R, Chen, W, Hsu, T, Yang, H and Chung, Y (2017) 'ANG  a combination of Apriori and graph computing techniques for frequent itemsets mining.' The Journal of Supercomputing. ISSN 15730484

Text
9490.pdf  Accepted Version Repository Terms Apply. Download (890kB)  Preview 
Abstract
The Apriori algorithm is one of the most wellknown and widely accepted methods for the association rule mining. In Apriori, it uses a prefix tree to represent kitemsets, generates kitemset candidates based on the frequent (k−1)itemsets, and determines the frequent kitemsets by traversing the prefix tree iteratively based on the transaction records. When k is small, the execution of Apriori is very efficient. However, the execution of Apriori could be very slow when k becomes large because of the deeper recursion depth to determine the frequent kitemsets. From the perspective of graph computing, the transaction records can be converted to a graph G(V,E), where V is the set of vertices of G that represents the transaction records and E is the set of edges of G that represents the relations among transaction records. Each kitemset in the transaction records will have a corresponding connected component in G. The number of vertices in the corresponding connected component is the support of the kitemset. Since the time to find the corresponding connected component of a kitemset in G is constant for any k, the graph computing method will be very efficient if the number of kitemsets is relatively small. Based on Apriori and graph computing techniques, a hybrid method, called Apriori and Graph Computing (ANG), is proposed to compute the frequent itemsets. Initially, ANG uses Apriori to compute the frequent kitemsets and then switches to the graph computing method when k becomes large (where the number of kitemset candidates is relatively small). The experimental results show that ANG outperforms both Apriori and the graph computing method for all test cases.
Item Type:  Article 

Keywords:  Apriori, graph computing, frequent itemset mining, data mining 
Divisions:  School of Creative Industries 
Identification Number:  https://doi.org/10.1007/s112270172049z 
Date Deposited:  19 Apr 2017 14:44 
Last Modified:  12 Mar 2020 14:55 
Request a change to this item or report an issue  
Update item (repository staff only) 