Publication | Closed Access
The Lack of A Priori Distinctions Between Learning Algorithms
1.6K
Citations
21
References
1996
Year
Artificial IntelligenceStatistical LearningEngineeringMachine LearningAlgorithmic LearningLearning AlgorithmData ScienceData MiningPattern RecognitionSupervised LearningTraining SetCognitive ScienceComputational Learning TheoryLoss FunctionsKnowledge DiscoveryComputer ScienceStatistical Learning TheoryCost-sensitive Machine LearningLearning Classifier System
The paper investigates assumption‑free relationships between learning algorithms using off‑training‑set error, argues that there are no a priori distinctions between algorithms, and discusses the implications for computational learning theory. It employs off‑training‑set error analysis to compare algorithms, noting that the second paper will address cases where distinctions do exist. The authors show that for any two algorithms there are as many targets where each outperforms the other, that cross‑validation can be worse than its anti‑counterpart, and that low empirical error, small VC dimension, and large training sets do not guarantee low off‑training‑set error, with further consequences for membership‑query and punting algorithms.
This is the first of two papers that use off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in which there are such distinctions.) In this first paper it is shown, loosely speaking, that for any two algorithms A and B, there are “as many” targets (or priors over targets) for which A has lower expected OTS error than B as vice versa, for loss functions like zero-one loss. In particular, this is true if A is cross-validation and B is “anti-cross-validation” (choose the learning algorithm with largest cross-validation error). This paper ends with a discussion of the implications of these results for computational learning theory. It is shown that one cannot say: if empirical misclassification rate is low, the Vapnik-Chervonenkis dimension of your generalizer is small, and the training set is large, then with high probability your OTS error is small. Other implications for “membership queries” algorithms and “punting” algorithms are also discussed.
| Year | Citations | |
|---|---|---|
Page 1
Page 1