Table of Contents
Fetching ...

Structured Learning of Compositional Sequential Interventions

Jialin Yu, Andreas Koukorinis, Nicolò Colombo, Yuchen Zhu, Ricardo Silva

TL;DR

This work poses an explicit model for composition, that is, how the effect of sequential interventions can be isolated into modules, clarifying which data conditions allow for the identification of their combined effect at different units and time steps, and shows the identification properties of the model.

Abstract

We consider sequential treatment regimes where each unit is exposed to combinations of interventions over time. When interventions are described by qualitative labels, such as "close schools for a month due to a pandemic" or "promote this podcast to this user during this week", it is unclear which appropriate structural assumptions allow us to generalize behavioral predictions to previously unseen combinations of interventions. Standard black-box approaches mapping sequences of categorical variables to outputs are applicable, but they rely on poorly understood assumptions on how reliable generalization can be obtained, and may underperform under sparse sequences, temporal variability, and large action spaces. To approach that, we pose an explicit model for composition, that is, how the effect of sequential interventions can be isolated into modules, clarifying which data conditions allow for the identification of their combined effect at different units and time steps. We show the identification properties of our compositional model, inspired by advances in causal matrix factorization methods. Our focus is on predictive models for novel compositions of interventions instead of matrix completion tasks and causal effect estimation. We compare our approach to flexible but generic black-box models to illustrate how structure aids prediction in sparse data conditions.

Structured Learning of Compositional Sequential Interventions

TL;DR

This work poses an explicit model for composition, that is, how the effect of sequential interventions can be isolated into modules, clarifying which data conditions allow for the identification of their combined effect at different units and time steps, and shows the identification properties of the model.

Abstract

We consider sequential treatment regimes where each unit is exposed to combinations of interventions over time. When interventions are described by qualitative labels, such as "close schools for a month due to a pandemic" or "promote this podcast to this user during this week", it is unclear which appropriate structural assumptions allow us to generalize behavioral predictions to previously unseen combinations of interventions. Standard black-box approaches mapping sequences of categorical variables to outputs are applicable, but they rely on poorly understood assumptions on how reliable generalization can be obtained, and may underperform under sparse sequences, temporal variability, and large action spaces. To approach that, we pose an explicit model for composition, that is, how the effect of sequential interventions can be isolated into modules, clarifying which data conditions allow for the identification of their combined effect at different units and time steps. We show the identification properties of our compositional model, inspired by advances in causal matrix factorization methods. Our focus is on predictive models for novel compositions of interventions instead of matrix completion tasks and causal effect estimation. We compare our approach to flexible but generic black-box models to illustrate how structure aids prediction in sparse data conditions.
Paper Structure (43 sections, 5 theorems, 53 equations, 9 figures, 2 tables)

This paper contains 43 sections, 5 theorems, 53 equations, 9 figures, 2 tables.

Key Result

Proposition 1

Let (i)$f(n, d^{1:T}, x^{1:T}, z) := f_d^{\mathsf T}(d^{1:T})f_{nxz}(n, x^{1:T}, z)$, where function sequences $f_d(d^{1:T})$ and $f_{nxz}(n, x^{1:T}, z)$ are defined for all $T \in \mathbb N^+$ and have codomain $\mathbb R^{r_g}$; (ii)$f_{dl}(d^{1:T})$ be given as in Eq. (eq:g) with $r_{h_l} = 1$

Figures (9)

  • Figure 1: Within unit $n$, actions $D_n^t$ interact with (latent) random effect parameters $\beta_n$ to produce behavior $X_n^t$ represented as a dense graphical model with square vertices denoting interventions $\mathrm{do}(d_n^{1:t})$pearl:09dawid:21. Further assumptions will be required for the identifiability of the impact of interventions and their combination, including how temporal impact takes shape and the number of independent units of observation.
  • Figure 2: Top: 5-run evaluation of test mean squared error on the fully-synthetic (left) and semi-synthetic cases (case). CSI-3 was removed on the right due to very high errors. Bottom: how errors change as training sizes are increased, CSI-1 vs. GRU-2 (left: fully-synthetic, right: semi-synthetic).
  • Figure 3: Effect of changing $r$ for the fully-synthetic dataset.
  • Figure 4: Examples of reconstruction of training data for CSI-VAE-1 model.
  • Figure 5: Model-based uncertainty quantification of reconstruction for CSI-VAE-1.
  • ...and 4 more figures

Theorems & Definitions (5)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Theorem 1
  • Theorem 2