Table of Contents
Fetching ...

Causal Contrastive Learning for Counterfactual Regression Over Time

Mouad El Bouchattaoui, Myriam Tami, Benoit Lepetit, Paul-Henry Cournède

TL;DR

This paper introduces a unique approach to counterfactual regression over time, emphasizing long-term predictions, and achieves state-of-the-art counterfactual estimation results using both synthetic and real-world data.

Abstract

Estimating treatment effects over time holds significance in various domains, including precision medicine, epidemiology, economy, and marketing. This paper introduces a unique approach to counterfactual regression over time, emphasizing long-term predictions. Distinguishing itself from existing models like Causal Transformer, our approach highlights the efficacy of employing RNNs for long-term forecasting, complemented by Contrastive Predictive Coding (CPC) and Information Maximization (InfoMax). Emphasizing efficiency, we avoid the need for computationally expensive transformers. Leveraging CPC, our method captures long-term dependencies in the presence of time-varying confounders. Notably, recent models have disregarded the importance of invertible representation, compromising identification assumptions. To remedy this, we employ the InfoMax principle, maximizing a lower bound of mutual information between sequence data and its representation. Our method achieves state-of-the-art counterfactual estimation results using both synthetic and real-world data, marking the pioneering incorporation of Contrastive Predictive Encoding in causal inference.

Causal Contrastive Learning for Counterfactual Regression Over Time

TL;DR

This paper introduces a unique approach to counterfactual regression over time, emphasizing long-term predictions, and achieves state-of-the-art counterfactual estimation results using both synthetic and real-world data.

Abstract

Estimating treatment effects over time holds significance in various domains, including precision medicine, epidemiology, economy, and marketing. This paper introduces a unique approach to counterfactual regression over time, emphasizing long-term predictions. Distinguishing itself from existing models like Causal Transformer, our approach highlights the efficacy of employing RNNs for long-term forecasting, complemented by Contrastive Predictive Coding (CPC) and Information Maximization (InfoMax). Emphasizing efficiency, we avoid the need for computationally expensive transformers. Leveraging CPC, our method captures long-term dependencies in the presence of time-varying confounders. Notably, recent models have disregarded the importance of invertible representation, compromising identification assumptions. To remedy this, we employ the InfoMax principle, maximizing a lower bound of mutual information between sequence data and its representation. Our method achieves state-of-the-art counterfactual estimation results using both synthetic and real-world data, marking the pioneering incorporation of Contrastive Predictive Encoding in causal inference.
Paper Structure (67 sections, 9 theorems, 66 equations, 8 figures, 22 tables, 2 algorithms)

This paper contains 67 sections, 9 theorems, 66 equations, 8 figures, 22 tables, 2 algorithms.

Key Result

Proposition 5.1

$I(\mathbf{C}_t^h,\mathbf{C}_t^f)\leq I(\mathbf{H}_{t}, (\mathbf{C}_t^h, \mathbf{C}_t^f))$.

Figures (8)

  • Figure 1: Causal graph over $\mathbf{H}_{t+1}$
  • Figure 2: Causal CPC architecture: The left shows the encoder, which learns context $\mathbf{C}_t$ from process history $\mathbf{H}_t$, with CPC and InfoMax objectives used for pretraining. The right shows the decoder, which autoregressively predicts the future outcome sequence from $\mathbf{C}_t$.
  • Figure 3: Evolution of error (NRMSE) in estimating counterfactual responses for cancer simulation data. Top: training sequence length 60. Bottom: training sequence length 40. In both cases, $\tau=10$. MSM is excluded due to high prediction errors.
  • Figure 4: Models' performance for cancer simulation, $\gamma=2$, $\tau=15$.
  • Figure 5: Performance for MIMIC III semi-synthetic, sequence length 60.
  • ...and 3 more figures

Theorems & Definitions (16)

  • Proposition 5.1
  • Theorem 5.2
  • Theorem 5.3
  • Theorem 5.4
  • Proposition B.4
  • proof
  • Proposition G.1
  • proof
  • proof
  • Proposition G.2
  • ...and 6 more