Concepedia

Publication | Open Access

Tree Induction Vs. Logistic Regression: a Learning-Curve Analysis

145

Citations

45

References

2001

Year

Abstract

Tree induction and logistic regression are two standard, off-the-shelf methods for building\nmodels for classification. We present a large-scale experimental comparison of logistic regression\nand tree induction, assessing classification accuracy and the quality of rankings based on classmembership\nprobabilities. We use a learning-curve analysis to examine the relationship of these\nmeasures to the size of the training set. The results of the study show several things. (1) Contrary\nto some prior observations, logistic regression does not generally outperform tree induction. (2)\nMore specifically, and not surprisingly, logistic regression is better for smaller training sets and tree\ninduction for larger data sets. Importantly, this often holds for training sets drawn from the same\ndomain (that is, the learning curves cross), so conclusions about induction-algorithmsuperiority on\na given domain must be based on an analysis of the learning curves. (3) Contrary to conventional\nwisdom, tree induction is effective at producing probability-based rankings, although apparently\ncomparatively less so for a given training-set size than at making classifications. Finally, (4) the\ndomains on which tree induction and logistic regression are ultimately preferable can be characterized\nsurprisingly well by a simple measure of the separability of signal from noise.

References

YearCitations

Page 1