Publication | Closed Access
A Distributed Decision Tree Algorithm and Its Implementation on Big Data Platforms
11
Citations
16
References
2016
Year
Unknown Venue
Cluster ComputingEngineeringDistributed Data AnalyticsBig Data InfrastructureOptimization-based Data MiningData ScienceData MiningDecision TreeDistributed EnvironmentBig Data ArchitectureDecision Tree LearningParallel ComputingData ManagementHigh-performance Data AnalyticsPredictive AnalyticsKnowledge DiscoveryComputer ScienceBig Data SearchParallel ProgrammingBig Data PlatformsDecision Tree AlgorithmsMassive Data ProcessingBig Data
Decision tree algorithms are very popular in the field of data mining. This paper proposes a distributed decision tree algorithm and shows examples of its implementation on big data platforms. The major contribution of this paper is the novel KS-Tree algorithm which builds a decision tree in a distributed environment. KS-Tree is applied to some real world data mining problems and compared with state-of-the-art decision tree techniques that are implemented in R and Apache Spark. The results show that KS-Tree can achieve better results, especially with large data sets. Furthermore, we demonstrate that KS-Tree can be applied to various data mining tasks, such as variable selection.
| Year | Citations | |
|---|---|---|
Page 1
Page 1