Online Gradient Boosting Decision Tree: In-Place Updates for Efficient Adding/Deleting Data
Huawei Lin, Jun Woo Chung, Yingjie Lao, Weijie Zhao
TL;DR
This work tackles the limitation of batch-only training in gradient boosting decision trees by introducing an in-place online learning framework that supports both incremental and decremental data updates without retraining from scratch. It combines a unified online update mechanism with optimizations—updating only online data contributions, adaptive lazy derivative updates, and split-candidate sampling with robustness controls—while providing a theoretical basis for trading accuracy against computational cost. Empirically, the approach delivers substantial speedups and lower memory usage across 10 public datasets, with additional demonstrations of backdoor data injection and removal. The result is a practical, scalable framework that enables continuous adaptation of GBDT models in dynamic data environments, accompanied by open-source code.
Abstract
Gradient Boosting Decision Tree (GBDT) is one of the most popular machine learning models in various applications. However, in the traditional settings, all data should be simultaneously accessed in the training procedure: it does not allow to add or delete any data instances after training. In this paper, we propose an efficient online learning framework for GBDT supporting both incremental and decremental learning. To the best of our knowledge, this is the first work that considers an in-place unified incremental and decremental learning on GBDT. To reduce the learning cost, we present a collection of optimizations for our framework, so that it can add or delete a small fraction of data on the fly. We theoretically show the relationship between the hyper-parameters of the proposed optimizations, which enables trading off accuracy and cost on incremental and decremental learning. The backdoor attack results show that our framework can successfully inject and remove backdoor in a well-trained model using incremental and decremental learning, and the empirical results on public datasets confirm the effectiveness and efficiency of our proposed online learning framework and optimizations.
