Table of Contents
Fetching ...

Feature Matching Intervention: Leveraging Observational Data for Causal Representation Learning

Haoze Li, Jun Xie

TL;DR

The paper introduces Feature Matching Intervention (FMI), a covariate-matching strategy that emulates perfect interventions in the latent causal graph to identify true causal features from a single training environment. It formalizes a causal latent-graph framework, provides a theoretical minimax guarantee for FMI under a set of structural assumptions, and proposes a validation-based workflow to detect when the learned feature is spurious. Empirical results on synthetic data, Colored MNIST, and WaterBirds show FMI outperforms standard ERM and invariance-based methods, demonstrating strong OOD generalization and feature fidelity. The work advances causal representation learning by enabling intervention-like identifiability without requiring multiple environments, with future directions toward handling multiple spurious features and broader covariate-shift settings.

Abstract

A major challenge in causal discovery from observational data is the absence of perfect interventions, making it difficult to distinguish causal features from spurious ones. We propose an innovative approach, Feature Matching Intervention (FMI), which uses a matching procedure to mimic perfect interventions. We define causal latent graphs, extending structural causal models to latent feature space, providing a framework that connects FMI with causal graph learning. Our feature matching procedure emulates perfect interventions within these causal latent graphs. Theoretical results demonstrate that FMI exhibits strong out-of-distribution (OOD) generalizability. Experiments further highlight FMI's superior performance in effectively identifying causal features solely from observational data.

Feature Matching Intervention: Leveraging Observational Data for Causal Representation Learning

TL;DR

The paper introduces Feature Matching Intervention (FMI), a covariate-matching strategy that emulates perfect interventions in the latent causal graph to identify true causal features from a single training environment. It formalizes a causal latent-graph framework, provides a theoretical minimax guarantee for FMI under a set of structural assumptions, and proposes a validation-based workflow to detect when the learned feature is spurious. Empirical results on synthetic data, Colored MNIST, and WaterBirds show FMI outperforms standard ERM and invariance-based methods, demonstrating strong OOD generalization and feature fidelity. The work advances causal representation learning by enabling intervention-like identifiability without requiring multiple environments, with future directions toward handling multiple spurious features and broader covariate-shift settings.

Abstract

A major challenge in causal discovery from observational data is the absence of perfect interventions, making it difficult to distinguish causal features from spurious ones. We propose an innovative approach, Feature Matching Intervention (FMI), which uses a matching procedure to mimic perfect interventions. We define causal latent graphs, extending structural causal models to latent feature space, providing a framework that connects FMI with causal graph learning. Our feature matching procedure emulates perfect interventions within these causal latent graphs. Theoretical results demonstrate that FMI exhibits strong out-of-distribution (OOD) generalizability. Experiments further highlight FMI's superior performance in effectively identifying causal features solely from observational data.

Paper Structure

This paper contains 51 sections, 6 theorems, 25 equations, 21 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

Under Assumptions assum:1-assum:4, any solution to eq:FMI achieves the minimax risk as in Formula eq:2. Therefore, FMI offers OOD generalization.

Figures (21)

  • Figure 1: Possible latent DAGs: (a) corresponds to the FIIF case and (b) corresponds to the PIIF case. The solid arrow represents functional relationship, the dashed arrow represents statistical dependence and the dotted arrow represents functional relationship between observed variable and latent variables.
  • Figure 2: Illustration of the matching approach: The Bayes classifier classifies green images as 1 and red images as 0. Although it achieves a risk smaller than that of the true feature (digit shape), it performs poorly in other environments. FMI subsamples according to another distribution from the original training environment and therefore balances the spurious feature (color). In this new distribution, we should expect the Bayes classifier to be based on the true feature.
  • Figure 3: The workflow of FMI: Given training data, we conduct a hypothesis test using a validation environment. If we reject $H_0$, we apply FMI to learn the true feature; otherwise, we use the ERM-learned feature.
  • Figure 4: The illustration of FMI workflow with Colored MNIST. Before matching, most digits are green between 0-4 and red between 5-9. After matching, the correlation between color (spurious feature) and digit class (target) disappears.
  • Figure 5: Plots of p-values for testing $Y^e|\hat{f}^e = 0$ in the training environment ($e_0 = 0.1$) and validation environment ($e = 0.9$) of Colored MNIST given different features. In each plot, the feature learned in the training environment is colored orange and the feature learned by FMI is colored blue. The y-axis in each plot represents the p-value of the goodness-of-fit test. The dashed red lines represent significance level $0.05$. The error bars are obtained by repeating the experiment ten times.
  • ...and 16 more figures

Theorems & Definitions (13)

  • Definition 1: Causal latent graph
  • Example 1
  • Theorem 1
  • Definition 2: Validation environment for feature
  • Proposition 1
  • Theorem 2
  • Lemma 1
  • Corollary 1
  • Lemma 2
  • Definition 3
  • ...and 3 more