Table of Contents
Fetching ...

TSCAN: Context-Aware Uplift Modeling via Two-Stage Training for Online Merchant Business Diagnosis

Hangtao Zhang, Zhe Li, Kairui Zhang

TL;DR

This paper addresses bias and context-heterogeneity in ITE estimation for online merchant diagnostics by introducing TSCAN, a two-stage framework that decouples bias mitigation (CAN-U) from direct uplift prediction (CAN-D). A Context-Aware Attention Layer enables tripartite interactions among merchant, treatment, and external context, while an isotonic output layer provides a regularization-free uplift modeling path. Empirical results on two real-world datasets and a live deployment show consistent improvements in AUUC and context-stratified metrics, including a 0.76% increase in merchant orders in a live A/B test. The approach offers practical benefits for personalized marketing in e-commerce and suggests future work on model compression, latent-context discovery, and cross-domain validation.

Abstract

A primary challenge in ITE estimation is sample selection bias. Traditional approaches utilize treatment regularization techniques such as the Integral Probability Metrics (IPM), re-weighting, and propensity score modeling to mitigate this bias. However, these regularizations may introduce undesirable information loss and limit the performance of the model. Furthermore, treatment effects vary across different external contexts, and the existing methods are insufficient in fully interacting with and utilizing these contextual features. To address these issues, we propose a Context-Aware uplift model based on the Two-Stage training approach (TSCAN), comprising CAN-U and CAN-D sub-models. In the first stage, we train an uplift model, called CAN-U, which includes the treatment regularizations of IPM and propensity score prediction, to generate a complete dataset with counterfactual uplift labels. In the second stage, we train a model named CAN-D, which utilizes an isotonic output layer to directly model uplift effects, thereby eliminating the reliance on the regularization components. CAN-D adaptively corrects the errors estimated by CAN-U through reinforcing the factual samples, while avoiding the negative impacts associated with the aforementioned regularizations. Additionally, we introduce a Context-Aware Attention Layer throughout the two-stage process to manage the interactions between treatment, merchant, and contextual features, thereby modeling the varying treatment effect in different contexts. We conduct extensive experiments on two real-world datasets to validate the effectiveness of TSCAN. Ultimately, the deployment of our model for real-world merchant diagnosis on one of China's largest online food ordering platforms validates its practical utility and impact.

TSCAN: Context-Aware Uplift Modeling via Two-Stage Training for Online Merchant Business Diagnosis

TL;DR

This paper addresses bias and context-heterogeneity in ITE estimation for online merchant diagnostics by introducing TSCAN, a two-stage framework that decouples bias mitigation (CAN-U) from direct uplift prediction (CAN-D). A Context-Aware Attention Layer enables tripartite interactions among merchant, treatment, and external context, while an isotonic output layer provides a regularization-free uplift modeling path. Empirical results on two real-world datasets and a live deployment show consistent improvements in AUUC and context-stratified metrics, including a 0.76% increase in merchant orders in a live A/B test. The approach offers practical benefits for personalized marketing in e-commerce and suggests future work on model compression, latent-context discovery, and cross-domain validation.

Abstract

A primary challenge in ITE estimation is sample selection bias. Traditional approaches utilize treatment regularization techniques such as the Integral Probability Metrics (IPM), re-weighting, and propensity score modeling to mitigate this bias. However, these regularizations may introduce undesirable information loss and limit the performance of the model. Furthermore, treatment effects vary across different external contexts, and the existing methods are insufficient in fully interacting with and utilizing these contextual features. To address these issues, we propose a Context-Aware uplift model based on the Two-Stage training approach (TSCAN), comprising CAN-U and CAN-D sub-models. In the first stage, we train an uplift model, called CAN-U, which includes the treatment regularizations of IPM and propensity score prediction, to generate a complete dataset with counterfactual uplift labels. In the second stage, we train a model named CAN-D, which utilizes an isotonic output layer to directly model uplift effects, thereby eliminating the reliance on the regularization components. CAN-D adaptively corrects the errors estimated by CAN-U through reinforcing the factual samples, while avoiding the negative impacts associated with the aforementioned regularizations. Additionally, we introduce a Context-Aware Attention Layer throughout the two-stage process to manage the interactions between treatment, merchant, and contextual features, thereby modeling the varying treatment effect in different contexts. We conduct extensive experiments on two real-world datasets to validate the effectiveness of TSCAN. Ultimately, the deployment of our model for real-world merchant diagnosis on one of China's largest online food ordering platforms validates its practical utility and impact.

Paper Structure

This paper contains 26 sections, 13 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: The network architecture of CAN-U and CAN-D.
  • Figure 2: The architecture of Context-Aware Attention Layer and the Context-Aware Gate Attention.
  • Figure 3: The architecture of Treatment-Aware Attention Layer and the Treatment-Aware Gate Attention.
  • Figure 4: An illustration of prediction process for the factual outcome, counterfactual outcome, and the corresponding uplift.
  • Figure 5: Diagram illustrating TSCAN's two-stage training process and counterfactual sampling diagram, where the black solid line represents the training flow and the blue dotted line represents the prediction flow.
  • ...and 2 more figures