Publication | Closed Access
Efficient decision tree construction on streaming data
148
Citations
27
References
2003
Year
Unknown Venue
Cluster ComputingEngineeringMachine LearningStreaming AlgorithmComputational ComplexityNumerical Interval PruningOptimization-based Data MiningInformation RetrievalData ScienceData MiningDecision TreeData IntegrationDecision Tree LearningData ManagementData OptimizationKnowledge DiscoveryComputer ScienceData Stream ManagementDecision Tree ConstructionData Stream MiningParallel ProgrammingBig Data
Decision tree construction is a well studied problem in data mining. Recently, there has been much interest in mining streaming data. Domingos and Hulten have presented a one-pass algorithm for decision tree construction. Their work uses Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed.In this paper, we revisit this problem. We make the following two contributions: 1) We present a numerical interval pruning (NIP) approach for efficiently processing numerical attributes. Our results show an average of 39% reduction in execution times. 2) We exploit the properties of the gain function entropy (and gini) to reduce the sample size required for obtaining a given bound on the accuracy. Our experimental results show a 37% reduction in the number of data instances required.
| Year | Citations | |
|---|---|---|
Page 1
Page 1