Generalization in Decision Trees and DNF: Does Size Matter?

Abstract

Recent theoretical results for pattern classification with thresholded real-valued functions (such as support vector machines, sigmoid networks, and boosting) give bounds on misclassification probability that do not depend on the size of the classifier, and hence can be considerably smaller than the bounds that follow from the VC theory. In this paper, we show that these techniques can be more widely applied, by representing other boolean functions as two-layer neural networks (thresholded convex combinations of boolean functions). For example, we show that with high probability any decision tree of depth no more than d that is consistent with m training examples has misclassification probability no more than O i \\Gamma 1 m \\Gamma N eff VCdim(U) log 2 m log d \\Delta\\Delta 1=2 j , where U is the class of node decision functions, and N eff N can be thought of as the effective number of leaves (it becomes small as the distribution on the leaves induced by the training data gets...

References

Page 1

	Year	Citations

Page 1