Condensed Gradient Boosting

Seyedsaman Emami; Gonzalo Martínez-Muñoz

Condensed Gradient Boosting

Seyedsaman Emami, Gonzalo Martínez-Muñoz

TL;DR

The paper addresses the computational burden of standard gradient boosting in multi-class classification and multi-output regression, where one tree per class per iteration is typically trained. It introduces Condensed Gradient Boosting (C-GB), which uses a single multi-output tree with vector-valued leaves and a two-step optimization: first fitting the base learner to pseudo-residuals via least-squares, then applying a Newton-Raphson refinement to update leaf outputs. Through extensive experiments on 12 multi-class and 3 multi-output regression datasets, C-GB shows comparable or improved generalization relative to standard GB while reducing training and prediction times, and it often outperforms competing multi-output approaches like TFBT and GBDT-MO. The results suggest substantial reductions in ensemble complexity with preserved accuracy, highlighting C-GB’s practical appeal for large-scale, multi-target problems, and the work provides an open-source implementation for broader use and extension. $\hat{\mathbf{F}}_m(\mathbf{x}) = \hat{\mathbf{F}}_{m-1}(\mathbf{x}) + \nu \tilde{\mathbf{h}}_m(\mathbf{x})$, with $\tilde{\mathbf{h}}_m(\mathbf{x}) = \{\gamma_{\{k,m\}} h_{\{k,m\}}(\mathbf{x})\}_{k=1}^K$ in the multi-output setting.$

Abstract

This paper presents a computationally efficient variant of gradient boosting for multi-class classification and multi-output regression tasks. Standard gradient boosting uses a 1-vs-all strategy for classifications tasks with more than two classes. This strategy translates in that one tree per class and iteration has to be trained. In this work, we propose the use of multi-output regressors as base models to handle the multi-class problem as a single task. In addition, the proposed modification allows the model to learn multi-output regression problems. An extensive comparison with other multi-ouptut based gradient boosting methods is carried out in terms of generalization and computational efficiency. The proposed method showed the best trade-off between generalization ability and training and predictions speeds.

Condensed Gradient Boosting

TL;DR

, with

in the multi-output setting.$

Abstract

Paper Structure (12 sections, 17 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 12 sections, 17 equations, 5 figures, 8 tables, 1 algorithm.

Introduction
Related work
Methodology
Gradient Boosted Decision Trees
Condensed Gradient Boosting
Illustrative examples
Experiments
Compared methods
Experimental setup
Evaluation
Results
Conclusion

Figures (5)

Figure 1: Classification boundaries for the C-GB (subplot a) and GB models (subplot b)
Figure 2: Decision tree regressors for the C-GB (a), class zero of GB (b), first class of GB (c), and last class of GB (d) for a multi-class classification problem with three classes
Figure 3: The precision curves for C-GB (blue) and GB (red) are depicted for each class across boosting epochs, showcasing the impact of varying maximum depth values on the decision tree
Figure 4: Comparison of different gradient boosting models (higher rank is better) using the Nemenyi Test, ($p = 0.05$)
Figure 5: Scatter Diagram of Prediction: For six outputs of the ATP7D and three models: C-GB (red dots), GB (dark blue dots), and GBDT-MO (yellow dots)

Condensed Gradient Boosting

TL;DR

Abstract

Condensed Gradient Boosting

Authors

TL;DR

Abstract

Table of Contents

Figures (5)