UTBoost: Gradient Boosted Decision Trees for Uplift Modeling

Junjie Gao; Xiangyu Zheng; DongDong Wang; Zhixiang Huang; Bangqi Zheng; Kai Yang

UTBoost: Gradient Boosted Decision Trees for Uplift Modeling

Junjie Gao, Xiangyu Zheng, DongDong Wang, Zhixiang Huang, Bangqi Zheng, Kai Yang

TL;DR

The paper tackles uplift modeling by introducing two novel gradient-boosted tree approaches that address the counterfactual nature of individual treatment effects. The first method, TDDP, extends uplift tree learning by boosting to maximize treatment-effect heterogeneity via transformed labels, while the second method, CausalGBM, jointly learns potential outcomes and causal effects within a single second-order gradient-boosted framework and uses an efficient leaf-weight approximation. By integrating causal information directly into the loss function and adopting a scalable two-variable leaf optimization, the authors demonstrate superior performance and robustness across multiple large-scale datasets compared to baselines. Experiments show CausalGBM achieves notable gains in the Qini metric and maintains robustness across varying data regimes, with UTBoost implemented under MIT license. These methods offer practical uplift modeling tools for personalized interventions in domains such as marketing and healthcare, enabling more accurate identification of individuals most responsive to treatments.

Abstract

Uplift modeling comprises a collection of machine learning techniques designed for managers to predict the incremental impact of specific actions on customer outcomes. However, accurately estimating this incremental impact poses significant challenges due to the necessity of determining the difference between two mutually exclusive outcomes for each individual. In our study, we introduce two novel modifications to the established Gradient Boosting Decision Trees (GBDT) technique. These modifications sequentially learn the causal effect, addressing the counterfactual dilemma. Each modification innovates upon the existing technique in terms of the ensemble learning method and the learning objective, respectively. Experiments with large-scale datasets validate the effectiveness of our methods, consistently achieving substantial improvements over baseline models.

UTBoost: Gradient Boosted Decision Trees for Uplift Modeling

TL;DR

Abstract

Paper Structure (15 sections, 1 theorem, 15 equations, 1 figure, 2 tables, 1 algorithm)

This paper contains 15 sections, 1 theorem, 15 equations, 1 figure, 2 tables, 1 algorithm.

Introduction
Uplift Problem Formulation
Tree Boosting for Treatment Effect Estimation
Ensemble Learning with Transformed Labels
Tree Construction Method
Split Criterion
Causal Gradient Boosting Machine
Learning Objective
Multi-objective Approximation
Greedy Algorithm for Tree Construction
Experiments
Evaluation Protocols
Overall Performance Comparison
Analysis of Ensemble Method
Conclusion

Key Result

proposition thmcounterproposition

Minimizing the mean squared errors of $\tau_i$ in the split nodes is equivalent to maximizing the difference between the average uplift within the left and right child nodes, i.e.,

Figures (1)

Figure 1: The result on different ensemble methods. The upper and lower parts are the results of TDDP and CasualGBM respectively, while the left and right parts represent the training and testing datasets. Two ensemble methods are distinguished by color.

Theorems & Definitions (1)

proposition thmcounterproposition

UTBoost: Gradient Boosted Decision Trees for Uplift Modeling

TL;DR

Abstract

UTBoost: Gradient Boosted Decision Trees for Uplift Modeling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (1)