Publication | Closed Access
AN EFFICIENT IMPLEMENTATION OF APRIORI ALGORITHM BASED ON HADOOP-MAPREDUCE MODEL
55
Citations
11
References
2012
Year
Unknown Venue
Cluster ComputingEngineeringFrequent Pattern MiningData ScienceData MiningAssociation RuleMrapriori AlgorithmKnowledge DiscoveryFrequent ItemsetsHadoopmapreduce ModelPattern MiningComputer ScienceMap-reduceParallel ComputingMining MethodsData ManagementMassive Data ProcessingBig Data
Finding frequent itemsets is one of the most important fields of data mining. Apriori algorithm is the most established algorithm for finding frequent itemsets from a transactional dataset; however, it needs to scan the dataset many times and to generate many candidate itemsets. Unfortunately, when the dataset size is huge, both memory use and computational cost can still be very expensive. In addition, single processor’s memory and CPU resources are very limited, which make the algorithm performance inefficient. Parallel and distributed computing are effective strategies for accelerating algorithms performance. In this paper, we have implemented an efficient MapReduce Apriori algorithm (MRApriori) based on HadoopMapReduce model which needs only two phases (MapReduce Jobs) to find all frequent k-itemsets, and compared our proposed MRApriori algorithm with current two existed algorithms which need either one or k phases (k is maximum length of frequent itemsets) to find the same frequent k-itemsets. Experimental results showed that the proposed MRApriori algorithm outperforms the other two algorithms.
| Year | Citations | |
|---|---|---|
Page 1
Page 1