M$^3$TN: Multi-gate Mixture-of-Experts based Multi-valued Treatment Network for Uplift Modeling
Zexu Sun, Xu Chen
TL;DR
M$^3$TN tackles multi-valued uplift modeling by combining a feature representation module built on Multi-gate Mixture-of-Experts (MMoE) with a reparameterization module that explicitly models uplift. The architecture reduces parameter load while ensuring accurate, consistently distributed uplift predictions through an additive formulation: $oldsymbol{rac{ m E}{oldsymbol{x}}}[ ext{mu}_k] = oldsymbol{rac{ m E}{oldsymbol{x}}}[ ext{mu}_0] + oldsymbol{rac{ m E}{oldsymbol{x}}}[ au^k]$. Empirical results on public and production datasets show state-of-the-art uplift performance and improved efficiency, with ablation and complexity analyses highlighting the benefits of explicit uplift modeling and MMoE. The work offers practical uplift solutions for campaigns with multiple incentives and lays groundwork for theoretical analysis in multi-task uplift settings.
Abstract
Uplift modeling is a technique used to predict the effect of a treatment (e.g., discounts) on an individual's response. Although several methods have been proposed for multi-valued treatment, they are extended from binary treatment methods. There are still some limitations. Firstly, existing methods calculate uplift based on predicted responses, which may not guarantee a consistent uplift distribution between treatment and control groups. Moreover, this may cause cumulative errors for multi-valued treatment. Secondly, the model parameters become numerous with many prediction heads, leading to reduced efficiency. To address these issues, we propose a novel \underline{M}ulti-gate \underline{M}ixture-of-Experts based \underline{M}ulti-valued \underline{T}reatment \underline{N}etwork (M$^3$TN). M$^3$TN consists of two components: 1) a feature representation module with Multi-gate Mixture-of-Experts to improve the efficiency; 2) a reparameterization module by modeling uplift explicitly to improve the effectiveness. We also conduct extensive experiments to demonstrate the effectiveness and efficiency of our M$^3$TN.
