Learning Decision Trees Using the Fourier Spectrum

TLDR

The decision tree model considered extends the traditional Boolean decision tree by allowing linear operations (sums of input subsets over GF(2)) at each node. The study presents a polynomial‑time algorithm for learning such decision trees under the uniform distribution. The algorithm uses membership queries and learns any function that can be approximated by a polynomially sparse Fourier representation in polynomial time. It shows that any function with polynomial L1‑norm, including linear‑operation decision trees, can be approximated by a polynomially sparse function, learned deterministically, and that depth‑d trees can be exactly identified in time polynomial in 2^d and n, so logarithmic‑depth trees are learnable in polynomial time.

Abstract

This work gives a polynomial time algorithm for learning decision trees with respect to the uniform distribution. (This algorithm uses membership queries.) The decision tree model that is considered is an extension of the traditional boolean decision tree model that allows linear operations in each node (i.e., summation of a subset of the input variables over $GF(2)$). This paper shows how to learn in polynomial time any function that can be approximated (in norm $L_2 $) by a polynomially sparse function (i.e., a function with only polynomially many nonzero Fourier coefficients). The authors demonstrate that any function f whose $L_1 $-norm (i.e., the sum of absolute value of the Fourier coefficients) is polynomial can be approximated by a polynomially sparse function, and prove that boolean decision trees with linear operations are a subset of this class of functions. Moreover, it is shown that the functions with polynomial $L_1 $-norm can be learned deterministically. The algorithm can also exactly identify a decision tree of depth d in time polynomial in $2^d $ and n. This result implies that trees of logarithmic depth can be identified in polynomial time.

References

Page 1

	Year	Citations
A theory of the learnable Leslie G. Valiant Communications of the ACM	1984	3.2K
Learning regular sets from queries and counterexamples Dana Angluin Information and Computation EngineeringInformation RetrievalData ScienceData MiningAutomated Reasoning	1987	2.1K
Parity, circuits, and the polynomial-time hierarchy Merrick L. Furst, James B. Saxe, Michael Sipser Theory of Computing Systems Circuit ComplexityComputational Complexity TheoryEngineeringPolynomial-time HierarchyFormal Methods	1984	930
∑11-Formulae on finite structures Miklós Ajtai Annals of Pure and Applied Logic Model TheoryMathematical StructureFinite Model TheoryFinite Structures	1983	583
A guided tour of chernoff bounds Torben Hagerup, Christine Rüb Information Processing Letters Mathematical ProgrammingEngineeringLower BoundGuided TourProbability Theory	1990	522
Harmonic Analysis of Polynomial Threshold Functions Jehoshua Bruck SIAM Journal on Discrete Mathematics	1990	195
Simple construction of almost k-wise independent random variables Noga Alon, Oded Goldreich, Johan Håstad, Randomness PropertiesEngineeringInformation TheoryKolmogorov ComplexityEntropy	2002	157
Learning decision trees from random examples Andrzej Ehrenfeucht, David Haussler Information and Computation EngineeringMachine LearningData MiningPattern RecognitionDecision Tree	1989	141
On the Power of Threshold Circuits with Small Weights Kai‐Yeung Siu, Jehoshua Bruck SIAM Journal on Discrete Mathematics Low-power ElectronicsCircuit ComplexityElectrical EngineeringComputational Complexity TheoryEngineering	1991	120
Interpolation and Approximation of Sparse Multivariate Polynomials over $GF(2)$ Ron M. Roth, Gyora M. Benedek SIAM Journal on Computing Numerical AnalysisMathematical ProgrammingComputational Complexity TheoryEngineeringComputational Complexity	1991	57

Page 1