Publication | Closed Access
Impact of learning set quality and size on decision tree performances.
47
Citations
19
References
2000
Year
Unknown Venue
Abstract. The quality of a decision tree is usually evaluated through its complexity and its generalization accuracy. Tree-simpliÞcation procedures aim at optimizing these two performance criteria. Among them, data reduction techniques differ from pruning by their simpliÞcation strategy. Actually, while pruning algorithms directly control tree size to combat the overÞtting problem, data reduction techniques perform a data preprocessing prior to decision tree construction to improve the learning set quality. Recent experimental results have shown that randomly manipulating training set size has a direct impact on tree size, and therefore recommend the use of the latter simpliÞcation strategy. In this paper, we provide theoretical arguments justifying data preprocessing in favor of tree simpliÞcation. We also investigate new data reduction techniques, usually used in the Þeld of prototype selection. From experiments with 22 datasets, we show that some of them are very efficient to improve standard post-pruning performances.
| Year | Citations | |
|---|---|---|
Page 1
Page 1