Concepedia

TLDR

The paper introduces CatBoost, a gradient boosting toolkit that addresses target leakage and demonstrates superior empirical performance. CatBoost employs ordered boosting and a novel categorical feature processing algorithm to eliminate prediction shift caused by target leakage. CatBoost outperforms existing boosting implementations on diverse datasets, achieving excellent empirical results.

Abstract

This paper presents the key algorithmic techniques behind CatBoost, a new gradient boosting toolkit. Their combination leads to CatBoost outperforming other publicly available boosting implementations in terms of quality on a variety of datasets. Two critical algorithmic advances introduced in CatBoost are the implementation of ordered boosting, a permutation-driven alternative to the classic algorithm, and an innovative algorithm for processing categorical features. Both techniques were created to fight a prediction shift caused by a special kind of target leakage present in all currently existing implementations of gradient boosting algorithms. In this paper, we provide a detailed analysis of this problem and demonstrate that proposed algorithms solve it effectively, leading to excellent empirical results.

References

YearCitations

Page 1