Causal Representation Learning Made Identifiable by Grouping of Observational Variables

Hiroshi Morioka; Aapo Hyvärinen

Causal Representation Learning Made Identifiable by Grouping of Observational Variables

Hiroshi Morioka, Aapo Hyvärinen

TL;DR

This paper addresses identifiability in Causal Representation Learning (CRL) by introducing a grouping of observational variables, enabling identifiability without temporal structure, supervision, or interventions. It proposes Grouped Causal Representation Learning (G-CaRL), a self-supervised framework that learns group-wise inverse mappings to recover latent causal variables and jointly estimate inter-group causal weights, with provable consistency under suitable assumptions. Theoretical results guarantee identifiability of latent variables up to permutations and variable-wise invertible transformations, and identifiability (up to scaling and transpose) of inter-group causal graphs under directed, nondegenerate structures. Empirically, G-CaRL outperforms state-of-the-art baselines across synthetic DAGs and cycles, gene-regulatory-like networks, and high-dimensional image data, while showing robustness to latent confounders and model misspecification. The work broadens CRL applicability by enabling instantaneous, nonlinear, and potentially cyclic interactions, without supervision, interventions, or strict dynamics, making it impactful for multimodal sensing, biology, and neuroscience.

Abstract

A topic of great current interest is Causal Representation Learning (CRL), whose goal is to learn a causal model for hidden features in a data-driven manner. Unfortunately, CRL is severely ill-posed since it is a combination of the two notoriously ill-posed problems of representation learning and causal discovery. Yet, finding practical identifiability conditions that guarantee a unique solution is crucial for its practical applicability. Most approaches so far have been based on assumptions on the latent causal mechanisms, such as temporal causality, or existence of supervision or interventions; these can be too restrictive in actual applications. Here, we show identifiability based on novel, weak constraints, which requires no temporal structure, intervention, nor weak supervision. The approach is based on assuming the observational mixing exhibits a suitable grouping of the observational variables. We also propose a novel self-supervised estimation framework consistent with the model, prove its statistical consistency, and experimentally show its superior CRL performances compared to the state-of-the-art baselines. We further demonstrate its robustness against latent confounders and causal cycles.

Causal Representation Learning Made Identifiable by Grouping of Observational Variables

TL;DR

Abstract

Paper Structure (66 sections, 10 theorems, 75 equations, 9 figures, 1 algorithm)

This paper contains 66 sections, 10 theorems, 75 equations, 9 figures, 1 algorithm.

Introduction
Related Works
Model Definition
Observation Model
Illustrative Example 1: Causally Related Sensor Measurements
Illustrative Example 2: Causal Dynamics
Causally Structured Latent Variable
Illustrative Example 1: Causally Related Sensor Measurements
Illustrative Example 2: Causal Dynamics
Identifiability of Representation Learning
Representation Learning Algorithm
Identifiability of Causal Discovery
Experiments
Simulation 1: DAG
Simulation 2: Cyclic Graphs with Latent Confounders
...and 51 more sections

Key Result

Theorem 1

Assume the generative model given by Eqs. eq:f and eq:ps, and also the following: Then, for all groups $m$ in ${\mathcal{M}}$ (or in the groups of interests), ${\mathbf{\bm{s}}}^m$ can be recovered up to permutation and variable-wise invertible transformations from the distribution of ${\mathbf{\bm{x}}}$.

Figures (9)

Figure 1: Comparison of the graphical models of major CRL frameworks whose goal is to estimate latent causal variables ${\mathbf{\bm{s}}}$ from the low-level observations ${\mathbf{\bm{x}}}$, usually with supervision or intervention $\mathbf u$. Our proposal in (c) is based on the grouping of variables (Eq. \ref{['eq:f']}; $M=4$ groups here) and the causal model based on a pairwise BN (Eq. \ref{['eq:ps']}), and does not require any supervision or intervention, greatly generalizing the existing models.
Figure 2: Comparison of CRL performance by the proposed G-CaRL and the baselines. The performances are measured by correlation for the latent variables, and by F1-score for the causal graphs, excluding the intra-group sub-graphs. The parentheses after the names of some (C)RL frameworks indicate the causal discovery frameworks additionally applied as post-processing.
Figure 3: (a) Illustrative description of the co-parents and co-children. (b--d) Illustrative examples of some causal graphs, which (do not) satisfy Assumption \ref{['An3']} (Theorem \ref{['thm:cd3']}) and/or \ref{['An2']} (Proposition \ref{['thm:cd2']}).
Figure 4: Comparison of a set of causal discovery frameworks (rows in each panel; measured by F1-score) applied to the baseline representation learning frameworks (rows), or directly to the latent variables (the last row: Latent; omitted in d since it is the same as b). We discarded some causal discovery frameworks (shaded by grey) on some panels since they did not converge within practical calculation time. (a) Simulation 1, (b) Simulation 2, (c) gene regulatory network recovery, and (d) high-dimensional image observations. Only the best performance for each panel was reported in Fig. \ref{['fig:sim']}.
Figure 5: Estimation performances of the latent variables (Pearson correlation) and the causal structures (F1-score) by the proposed framework G-CaRL, but different settings of (Left) the complexity of the observation models (the number of MLP-layers $L$ of the observation function $\mathbf{\bm{f}}$), (Middle) the number of groups ($M$), and (Right) the number of variables ($D_\mathcal{S}$), with changing the number of samples $n$. Simulation 1 (basic DAG). (b) Simulation 2 (cycles and latent confounders). The values are the averages of 10 runs for each setting, and the shaded regions show the standard deviations. Fig. \ref{['fig:sim']}a corresponds to the case $L=3$, $M=3$, $D_\mathcal{S}=30$, and $n=2^{16}$ in a, and Fig. \ref{['fig:sim']}b corresponds to the case $L=3$, $M=3$, $D_\mathcal{S}=30$, and $n=2^{20}$ in b.
...and 4 more figures

Theorems & Definitions (21)

Theorem 1
Theorem 2
Theorem 3
Definition 1
Definition 2
Lemma 1
proof
proof
Lemma 2
proof
...and 11 more

Causal Representation Learning Made Identifiable by Grouping of Observational Variables

TL;DR

Abstract

Causal Representation Learning Made Identifiable by Grouping of Observational Variables

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (21)