Publication | Closed Access
Convexity, Classification, and Risk Bounds
1K
Citations
33
References
2006
Year
EngineeringMachine LearningConvex Loss FunctionsRisk AnalysisMachine Learning LiteratureClassification MethodPattern RecognitionStatisticsSupervised LearningLinear OptimizationRisk BoundsComputational Learning TheoryLoss FunctionProbability TheoryComputer ScienceStatistical Learning TheoryConvex OptimizationStatistical InferenceCost-sensitive Machine Learning
Classification algorithms such as SVM and boosting minimize convex surrogates of the 0–1 loss, yielding computational efficiency but introducing statistical trade‑offs that must be understood. The authors aim to quantify the relationship between 0–1 loss risk and surrogate loss risk for any nonnegative surrogate. They derive this relationship via a simple variational transformation of the loss function that is computationally straightforward. The resulting bounds provide nontrivial excess‑risk limits under Fisher consistency, show that strictly convex losses achieve faster convergence under low‑noise conditions, and enable estimation of convergence rates for function classes that are scaled convex hulls of finite‑dimensional bases.
Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0–1 loss function. The convexity makes these algorithms computationally efficient. The use of a surrogate, however, has statistical consequences that must be balanced against the computational virtues of convexity. To study these issues, we provide a general quantitative relationship between the risk as assessed using the 0–1 loss and the risk as assessed using any nonnegative surrogate loss function. We show that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function—that it satisfies a pointwise form of Fisher consistency for classification. The relationship is based on a simple variational transformation of the loss function that is easy to compute in many applications. We also present a refined version of this result in the case of low noise, and show that in this case, strictly convex loss functions lead to faster rates of convergence of the risk than would be implied by standard uniform convergence arguments. Finally, we present applications of our results to the estimation of convergence rates in function classes that are scaled convex hulls of a finite-dimensional base class, with a variety of commonly used loss functions.
| Year | Citations | |
|---|---|---|
Page 1
Page 1