Table of Contents
Fetching ...

Uplift Modeling Under Limited Supervision

George Panagopoulos, Daniele Malitesta, Fragkiskos D. Malliaros, Jun Pang

TL;DR

This work proposes a graph neural network to diminish the required training set size, relying on graphs that are common in e-commerce data, and develops a two-model neural architecture akin to previous causal effect estimators.

Abstract

Estimating causal effects in e-commerce tends to involve costly treatment assignments which can be impractical in large-scale settings. Leveraging machine learning to predict such treatment effects without actual intervention is a standard practice to diminish the risk. However, existing methods for treatment effect prediction tend to rely on training sets of substantial size, which are built from real experiments and are thus inherently risky to create. In this work we propose a graph neural network to diminish the required training set size, relying on graphs that are common in e-commerce data. Specifically, we view the problem as node regression with a restricted number of labeled instances, develop a two-model neural architecture akin to previous causal effect estimators, and test varying message-passing layers for encoding. Furthermore, as an extra step, we combine the model with an acquisition function to guide the creation of the training set in settings with extremely low experimental budget. The framework is flexible since each step can be used separately with other models or treatment policies. The experiments on real large-scale networks indicate a clear advantage of our methodology over the state of the art, which in many cases performs close to random, underlining the need for models that can generalize with limited supervision to reduce experimental risks.

Uplift Modeling Under Limited Supervision

TL;DR

This work proposes a graph neural network to diminish the required training set size, relying on graphs that are common in e-commerce data, and develops a two-model neural architecture akin to previous causal effect estimators.

Abstract

Estimating causal effects in e-commerce tends to involve costly treatment assignments which can be impractical in large-scale settings. Leveraging machine learning to predict such treatment effects without actual intervention is a standard practice to diminish the risk. However, existing methods for treatment effect prediction tend to rely on training sets of substantial size, which are built from real experiments and are thus inherently risky to create. In this work we propose a graph neural network to diminish the required training set size, relying on graphs that are common in e-commerce data. Specifically, we view the problem as node regression with a restricted number of labeled instances, develop a two-model neural architecture akin to previous causal effect estimators, and test varying message-passing layers for encoding. Furthermore, as an extra step, we combine the model with an acquisition function to guide the creation of the training set in settings with extremely low experimental budget. The framework is flexible since each step can be used separately with other models or treatment policies. The experiments on real large-scale networks indicate a clear advantage of our methodology over the state of the art, which in many cases performs close to random, underlining the need for models that can generalize with limited supervision to reduce experimental risks.
Paper Structure (23 sections, 10 equations, 2 figures, 6 tables, 1 algorithm)

This paper contains 23 sections, 10 equations, 2 figures, 6 tables, 1 algorithm.

Figures (2)

  • Figure 1: Schematic representation of UMGNet. First (a), the bipartite and undirected user-product graph, along with the node features, are injected into the framework. Second (b), node features for users and products are projected to the same latent space through an FC layer and used as input to the GNN model; another FC layer takes the GNN's output and input to predict the regression outcome. Third (c), outputs for $T = 1$ and $T = 0$ are considered separately and injected into two different FC layers to calculate $\text{loss}_y$; the general output of the regression FC is used to calculate $\text{loss}_t$.
  • Figure 2: Uplift of the predicted sets on the MovieLens dataset. Regular $\text{ATE} = 0.457$.