Publication | Closed Access
Hash based parallel algorithms for mining association rules
130
Citations
10
References
2002
Year
Unknown Venue
Cluster ComputingEngineeringComputer ArchitecturePattern MiningMap-reduceCandidate ItemsetsParallel AlgorithmsData ScienceData MiningParallel ComputingData ManagementAssociation RulesParallel DatabaseKnowledge DiscoveryComputer EngineeringHash FunctionComputer ScienceData-intensive ComputingFrequent Pattern MiningAssociation RuleParallel ProgrammingBig Data
We propose four parallel algorithms (NPA, SPA, HPA and HPA-ELD) for mining association rules on shared nothing parallel machines to improve its performance. In NPA, candidate itemsets are just copied amongst all the processors, which can lead to memory overflow for large transaction databases. The remaining three algorithms partition the candidate itemsets over the processors. If it is partitioned simply (SPA), transaction data has to be broadcast to all processors. HPA partitions the candidate itemsets using a hash function to eliminate broadcasting, which also reduces the comparison workload significantly. HPA-ELD fully utilizes the available memory space by detecting the extremely large itemsets and copying them, which is also very effective at flattering the load over the processors. We implemented these algorithms in a shared nothing environment. Performance evaluations show that the best algorithm, HPA-ELD, attains good linearity on speedup ratio and is effective for handling skew.
| Year | Citations | |
|---|---|---|
Page 1
Page 1