Table of Contents
Fetching ...

Online Gradient Boosting Decision Tree: In-Place Updates for Efficient Adding/Deleting Data

Huawei Lin, Jun Woo Chung, Yingjie Lao, Weijie Zhao

TL;DR

This work tackles the limitation of batch-only training in gradient boosting decision trees by introducing an in-place online learning framework that supports both incremental and decremental data updates without retraining from scratch. It combines a unified online update mechanism with optimizations—updating only online data contributions, adaptive lazy derivative updates, and split-candidate sampling with robustness controls—while providing a theoretical basis for trading accuracy against computational cost. Empirically, the approach delivers substantial speedups and lower memory usage across 10 public datasets, with additional demonstrations of backdoor data injection and removal. The result is a practical, scalable framework that enables continuous adaptation of GBDT models in dynamic data environments, accompanied by open-source code.

Abstract

Gradient Boosting Decision Tree (GBDT) is one of the most popular machine learning models in various applications. However, in the traditional settings, all data should be simultaneously accessed in the training procedure: it does not allow to add or delete any data instances after training. In this paper, we propose an efficient online learning framework for GBDT supporting both incremental and decremental learning. To the best of our knowledge, this is the first work that considers an in-place unified incremental and decremental learning on GBDT. To reduce the learning cost, we present a collection of optimizations for our framework, so that it can add or delete a small fraction of data on the fly. We theoretically show the relationship between the hyper-parameters of the proposed optimizations, which enables trading off accuracy and cost on incremental and decremental learning. The backdoor attack results show that our framework can successfully inject and remove backdoor in a well-trained model using incremental and decremental learning, and the empirical results on public datasets confirm the effectiveness and efficiency of our proposed online learning framework and optimizations.

Online Gradient Boosting Decision Tree: In-Place Updates for Efficient Adding/Deleting Data

TL;DR

This work tackles the limitation of batch-only training in gradient boosting decision trees by introducing an in-place online learning framework that supports both incremental and decremental data updates without retraining from scratch. It combines a unified online update mechanism with optimizations—updating only online data contributions, adaptive lazy derivative updates, and split-candidate sampling with robustness controls—while providing a theoretical basis for trading accuracy against computational cost. Empirically, the approach delivers substantial speedups and lower memory usage across 10 public datasets, with additional demonstrations of backdoor data injection and removal. The result is a practical, scalable framework that enables continuous adaptation of GBDT models in dynamic data environments, accompanied by open-source code.

Abstract

Gradient Boosting Decision Tree (GBDT) is one of the most popular machine learning models in various applications. However, in the traditional settings, all data should be simultaneously accessed in the training procedure: it does not allow to add or delete any data instances after training. In this paper, we propose an efficient online learning framework for GBDT supporting both incremental and decremental learning. To the best of our knowledge, this is the first work that considers an in-place unified incremental and decremental learning on GBDT. To reduce the learning cost, we present a collection of optimizations for our framework, so that it can add or delete a small fraction of data on the fly. We theoretically show the relationship between the hyper-parameters of the proposed optimizations, which enables trading off accuracy and cost on incremental and decremental learning. The backdoor attack results show that our framework can successfully inject and remove backdoor in a well-trained model using incremental and decremental learning, and the empirical results on public datasets confirm the effectiveness and efficiency of our proposed online learning framework and optimizations.

Paper Structure

This paper contains 38 sections, 16 equations, 17 figures, 16 tables, 4 algorithms.

Figures (17)

  • Figure 1: An example for the incremental learning and decremental learning procedure in the proposed framework. (a) For the node of Loan < 31, the current split is still the best after online learning. Thus, the split does not need to change. (b) An already well-trained tree in $D_\textit{tr}$. (c) For the node of Auto < 57, the best split has shifted after online learning. (d) Incremental update for derivatives -- only update the derivatives for those data reaching the changed terminal nodes.
  • Figure 2: Observation of distance of best split changes. The lines represents the average changes of best split distance, and the shaded region is the standard error.
  • Figure 3: The impact of tuning data size on the number of retrained nodes for each iteration in incremental learning.
  • Figure 4: Feature discretization example. For a feature, all its values are grouped into 8 bins, i.e., the original feature values become integers between 0 to 7 assigned to the nearest bin.
  • Figure 5: The impact of tuning data size on the number of retrained nodes for each iteration in incremental learning.
  • ...and 12 more figures