LightGBM: A Highly Efficient Gradient Boosting Decision Tree

TLDR

Gradient Boosting Decision Trees are widely used, yet existing implementations struggle with high‑dimensional, large‑scale data because they must scan all instances for each feature to compute split gains. This work introduces Gradient‑based One‑Side Sampling and Exclusive Feature Bundling to address this inefficiency. GOSS discards many low‑gradient samples and uses the remaining ones to estimate split gains, while EFB groups mutually exclusive features into bundles to reduce the feature space. Theoretical proofs and experiments show that LightGBM, which incorporates both techniques, can accelerate training by over 20× while preserving accuracy.

Abstract

Gradient Boosting Decision Tree (GBDT) is a popular machine learning algorithm, and has quite a few effective implementations such as XGBoost and pGBRT. Although many engineering optimizations have been adopted in these implementations, the efficiency and scalability are still unsatisfactory when the feature dimension is high and data size is large. A major reason is that for each feature, they need to scan all the data instances to estimate the information gain of all possible split points, which is very time consuming. To tackle this problem, we propose two novel techniques: Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB). With GOSS, we exclude a significant proportion of data instances with small gradients, and only use the rest to estimate the information gain. We prove that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size. With EFB, we bundle mutually exclusive features (i.e., they rarely take nonzero values simultaneously), to reduce the number of features. We prove that finding the optimal bundling of exclusive features is NP-hard, but a greedy algorithm can achieve quite good approximation ratio (and thus can effectively reduce the number of features without hurting the accuracy of split point determination by much). We call our new GBDT implementation with GOSS and EFB LightGBM. Our experiments on multiple public datasets show that, LightGBM speeds up the training process of conventional GBDT by up to over 20 times while achieving almost the same accuracy.

References

Page 1

	Year	Citations
Greedy function approximation: A gradient boosting machine. Jerome H. Friedman The Annals of Statistics EngineeringMachine LearningData MiningPattern RecognitionGreedy Function Approximation	2001	27.3K
Stochastic gradient boosting Jerome H. Friedman Computational Statistics & Data Analysis EngineeringMachine LearningData ScienceData MiningPattern Recognition	2002	6.9K
Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors) Jerome H. Friedman, Trevor Hastie, Robert Tibshirani The Annals of Statistics EngineeringMachine LearningText MiningClassification MethodData Science	2000	6.9K
Predicting clicks Matthew Richardson, Ewa Dominowska, Robert Ragno EngineeringMachine LearningAdvertising SystemTargeted AdvertisingSearch Engine Marketing	2007	1K
SPRINT: A Scalable Parallel Classifier for Data Mining John Shafer, Rakesh Agrawal, Manish Mehta	1996	781
Generalized Boosted Models: A guide to the gbm package Greg Ridgeway	2006	769
McRank: Learning to Rank Using Multiple Classification and Gradient Boosting Ping Li, Qiang Wu, Christopher J. C. Burges	2007	434
Accelerating the XGBoost algorithm using GPU computing Rory Mitchell, Eibe Frank PeerJ Computer Science Library XgboostEngineeringMachine LearningGpu BenchmarkingComputer Architecture	2017	318
Hyperspectral data analysis and supervised feature reduction via projection pursuit Luís Ortiz Jiménez, D. A. Landgrebe IEEE Transactions on Geoscience and Remote Sensing EngineeringMachine LearningFeature ExtractionFeature SelectionFull Dimensionality	1999	281
Introducing LETOR 4.0 Datasets Tao Qin, Tieyan Liu arXiv (Cornell University) EngineeringIntelligent Information RetrievalLearning To RankSemantic WebLarge-scale Datasets	2013	199

Page 1