Using AUC and accuracy in evaluating learning algorithms

TLDR

AUC has been used in medical diagnosis since the 1970s and has recently been proposed as a single‑number measure for learning algorithms, yet no formal arguments have been given for preferring it over accuracy, even though Naive Bayes and decision trees are known to have similar predictive accuracy. The study establishes formal criteria for comparing learning‑algorithm measures and demonstrates that AUC is superior to accuracy. Reevaluating established accuracy‑based claims with AUC reveals that Naive Bayes outperforms decision trees in AUC and yields surprising new results that could significantly influence machine‑learning and data‑mining applications.

Abstract

The area under the ROC (receiver operating characteristics) curve, or simply AUC, has been traditionally used in medical diagnosis since the 1970s. It has recently been proposed as an alternative single-number measure for evaluating the predictive ability of learning algorithms. However, no formal arguments were given as to why AUC should be preferred over accuracy. We establish formal criteria for comparing two different measures for learning algorithms and we show theoretically and empirically that AUC is a better measure (defined precisely) than accuracy. We then reevaluate well-established claims in machine learning based on accuracy using AUC and obtain interesting and surprising new results. For example, it has been well-established and accepted that Naive Bayes and decision trees are very similar in predictive accuracy. We show, however, that Naive Bayes is significantly better than decision trees in AUC. The conclusions drawn in this paper may make a significant impact on machine learning and data mining applications.

References

Page 1

	Year	Citations

Page 1