Unifying Causal Representation Learning with the Invariance Principle

Dingling Yao; Dario Rancati; Riccardo Cadei; Marco Fumero; Francesco Locatello

Unifying Causal Representation Learning with the Invariance Principle

Dingling Yao, Dario Rancati, Riccardo Cadei, Marco Fumero, Francesco Locatello

TL;DR

This work introduces a unifying invariance-based framework for causal representation learning, showing that many CRL methods effectively align representations with data symmetries rather than adhering to a strict causal hierarchy. By formalizing invariance properties on latent blocks and designing encoders and selectors that enforce invariance and sufficiency, the authors achieve block-identifiability of invariant latent components and clarify when variant latents remain unidentified. The theory subsumes multiview, interventional, temporal, multi-task, and domain-generalization CRL as special cases and demonstrates how different intervention regimes affect identifiability. Empirically, the framework improves real-world treatment effect estimation on ISTAnt and exhibits noncausal invariance sufficiency in synthetic ninterventions, underscoring the practical value of leveraging symmetries. Overall, the paper reframes CRL around invariances, providing a flexible, broadly applicable blueprint for discovering causal variables and deploying robust downstream predictors.

Abstract

Causal representation learning (CRL) aims at recovering latent causal variables from high-dimensional observations to solve causal downstream tasks, such as predicting the effect of new interventions or more robust classification. A plethora of methods have been developed, each tackling carefully crafted problem settings that lead to different types of identifiability. These different settings are widely assumed to be important because they are often linked to different rungs of Pearl's causal hierarchy, even though this correspondence is not always exact. This work shows that instead of strictly conforming to this hierarchical mapping, many causal representation learning approaches methodologically align their representations with inherent data symmetries. Identification of causal variables is guided by invariance principles that are not necessarily causal. This result allows us to unify many existing approaches in a single method that can mix and match different assumptions, including non-causal ones, based on the invariance relevant to the problem at hand. It also significantly benefits applicability, which we demonstrate by improving treatment effect estimation on real-world high-dimensional ecological data. Overall, this paper clarifies the role of causal assumptions in the discovery of causal variables and shifts the focus to preserving data symmetries.

Unifying Causal Representation Learning with the Invariance Principle

TL;DR

Abstract

Paper Structure (35 sections, 15 theorems, 58 equations, 4 figures, 3 tables)

This paper contains 35 sections, 15 theorems, 58 equations, 4 figures, 3 tables.

Introduction
Problem Setting
Identifiability Theory via the Invariance Principle
Related Works as Special Cases of Our Theory
Experiments
Case Study: ISTAnt
Synthetic Ablation with "Ninterventions"
Conclusion
Appendix
Notation and Terminology
Preliminaries
Identifiability Theory
On the granularity of latent variable identification
Identifying the causal graph
Related Works
...and 20 more sections

Key Result

Theorem 3.1

Consider a set of observables ${\mathcal{S}}_{\mathbf{x}} = \{\mathbf{x}^1, \mathbf{x}^2, \dots, \mathbf{x}^K\} \in \mathcal{X}$ generated from sec:dgp satisfying assmp:iota_observation_relation. Let $G, \Phi$ be the set of smooth encoders (defn:encoders) and selectors (defn:block_selectors) that sa

Figures (4)

Figure 1: TERB and Balanced Accuracy with standard deviation over 20 different seeds varying the invariance weight $\lambda_{\text{INV}}$ of V-REx krueger2021out on ISTAnt dataset cadei2024smoke. Stars represent the selected best models based on a small but heterogeneous validation set.
Figure 2: Relations between different identification classes (\ref{['defn:blockID', 'defn:crlID', 'defn:affineID', 'defn:block_affineID']}). Some CRL works proposed a more fine-grained classification of identifiability concepts with slightly different terminology, which we omit here for readability.
Figure 3: Causal Model for generic partially annotated scientific experiment: $T$ treatment, $\bm{W}$ experimental settings, $\bm{X}$ high-dimensional observation, $Y$ outcome, $S$ annotation flag. Figure and caption adapted from cadei2024smoke
Figure 4: Examples of high-dimensional observations $\bm{X}$ with corresponding annotated social behaviour $Y$ (grooming). Figure and caption adapted from cadei2024smoke

Theorems & Definitions (30)

Definition 2.1: Invariance property
Definition 2.2: Observable of a set of latent random vectors
Definition 3.1: Block-identifiability von2021self
Definition 3.2: Encoders
Definition 3.3: Selection yao2023multi
Definition 3.4: Invariant block selectors
Theorem 3.1: Identifiability of multiple invariant blocks
Proposition 3.1: General non-identifiability of variant latent variables
Proposition 3.1: Identifiability of variant latent under independence
Definition 5.1: Nintervention
...and 20 more

Unifying Causal Representation Learning with the Invariance Principle

TL;DR

Abstract

Unifying Causal Representation Learning with the Invariance Principle

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (30)