UTBoost: Gradient Boosted Decision Trees for Uplift Modeling
Junjie Gao, Xiangyu Zheng, DongDong Wang, Zhixiang Huang, Bangqi Zheng, Kai Yang
TL;DR
The paper tackles uplift modeling by introducing two novel gradient-boosted tree approaches that address the counterfactual nature of individual treatment effects. The first method, TDDP, extends uplift tree learning by boosting to maximize treatment-effect heterogeneity via transformed labels, while the second method, CausalGBM, jointly learns potential outcomes and causal effects within a single second-order gradient-boosted framework and uses an efficient leaf-weight approximation. By integrating causal information directly into the loss function and adopting a scalable two-variable leaf optimization, the authors demonstrate superior performance and robustness across multiple large-scale datasets compared to baselines. Experiments show CausalGBM achieves notable gains in the Qini metric and maintains robustness across varying data regimes, with UTBoost implemented under MIT license. These methods offer practical uplift modeling tools for personalized interventions in domains such as marketing and healthcare, enabling more accurate identification of individuals most responsive to treatments.
Abstract
Uplift modeling comprises a collection of machine learning techniques designed for managers to predict the incremental impact of specific actions on customer outcomes. However, accurately estimating this incremental impact poses significant challenges due to the necessity of determining the difference between two mutually exclusive outcomes for each individual. In our study, we introduce two novel modifications to the established Gradient Boosting Decision Trees (GBDT) technique. These modifications sequentially learn the causal effect, addressing the counterfactual dilemma. Each modification innovates upon the existing technique in terms of the ensemble learning method and the learning objective, respectively. Experiments with large-scale datasets validate the effectiveness of our methods, consistently achieving substantial improvements over baseline models.
