Publication | Closed Access
Parallelization of decision tree algorithm and its performance evaluation
16
Citations
6
References
2000
Year
Unknown Venue
Cluster ComputingEngineeringBusiness IntelligenceComputational ComplexityParallel MetaheuristicsOptimization-based Data MiningDecision Tree AlgorithmIntra-node ParallelismData ScienceData MiningDecision TreeParallel Complexity TheoryManagementDecision Tree LearningParallel ComputingCombinatorial OptimizationData ManagementParallel DatabaseKnowledge DiscoveryComputer EngineeringAttribute ParallelismComputer ScienceEvolutionary Data MiningRecord ParallelismParallel ProgrammingClassificationBig Data
Data mining is a typical application of high performance computing in the business field. An efficient data mining system which can deal with huge amount of data is desired. This paper describes the parallel processing of decision tree which is a typical algorithm for classification of large database. A free software C4.5 is parallelized for SMP machine using thread library. Parallelism in generating a decision tree can be classified into intra-node parallelism and inter-node parallelism. Intra-node parallelism can be further classified into record parallelism, attribute parallelism, and their combination. We have implemented these four kinds of parallelizing methods, and evaluated their effects with four kinds of test data. The result shows that there is a relation between the characteristics of data and the parallelizing methods, and combination of multiple parallelizing methods is the most effective one.
| Year | Citations | |
|---|---|---|
Page 1
Page 1