General Identifiability and Achievability for Causal Representation Learning

Burak Varıcı; Emre Acartürk; Karthikeyan Shanmugam; Ali Tajer

General Identifiability and Achievability for Causal Representation Learning

Burak Varıcı, Emre Acartürk, Karthikeyan Shanmugam, Ali Tajer

TL;DR

This work extends causal representation learning to a general nonparametric latent model with a diffeomorphic transformation to observations, and shows that two hard interventions per latent node suffice for perfect identifiability of the latent DAG and latent variables even when the environment-to-node mapping is unknown (uncoupled interventions). It introduces GSCALE-I, a score-based algorithm that leverages interventional score variations to recover the inverse transformation and latent factors, providing provable guarantees under both uncoupled and coupled settings and without faithfulness assumptions when observational data are available. The paper also proves identifiability results under coupled interventions with weaker discrepancy requirements and demonstrates the approach via synthetic experiments that confirm high recovery accuracy, while highlighting the importance of accurate score estimation. Overall, the results offer a constructive path to identifiability and practical latent-variable recovery in CRL under very general, nonparametric conditions, and establish a foundation for further reducing intervention requirements and enhancing score-estimation methods in real data scenarios.

Abstract

This paper focuses on causal representation learning (CRL) under a general nonparametric latent causal model and a general transformation model that maps the latent data to the observational data. It establishes identifiability and achievability results using two hard uncoupled interventions per node in the latent causal graph. Notably, one does not know which pair of intervention environments have the same node intervened (hence, uncoupled). For identifiability, the paper establishes that perfect recovery of the latent causal model and variables is guaranteed under uncoupled interventions. For achievability, an algorithm is designed that uses observational and interventional data and recovers the latent causal model and variables with provable guarantees. This algorithm leverages score variations across different environments to estimate the inverse of the transformer and, subsequently, the latent variables. The analysis, additionally, recovers the identifiability result for two hard coupled interventions, that is when metadata about the pair of environments that have the same node intervened is known. This paper also shows that when observational data is available, additional faithfulness assumptions that are adopted by the existing literature are unnecessary.

General Identifiability and Achievability for Causal Representation Learning

TL;DR

Abstract

Paper Structure (41 sections, 12 theorems, 98 equations, 2 tables, 1 algorithm)

This paper contains 41 sections, 12 theorems, 98 equations, 2 tables, 1 algorithm.

INTRODUCTION
Background
Related Work
PROBLEM SETTING
IDENTIFIABILITY AND ACHIEVABILITY RESULTS
GSCALE-I ALGORITHM
PROPERTIES OF GSCALE-I
Properties of Score Functions
Analysis of Algorithm Steps
EMPIRICAL EVALUATIONS
CONCLUSION
Related Work
Identifiable representation learning.
Causal representation learning.
CRL from interventions.
...and 26 more sections

Key Result

Theorem 1

Using observational data and interventional data from two uncoupled environments for which each pair in $\{p_i, q_i, \tilde{q}_i\}$ satisfies interventional discrepancy, suffices to

Theorems & Definitions (17)

Definition 1: Interventional discrepancy
Definition 2: Coupled/Uncoupled environments
Definition 3: Perfect Identifiability
Theorem 1: Uncoupled Environments
Theorem 2: Coupled Environments
Theorem 3: No Observational Data
Remark 1
Lemma 1: Score Changes
Lemma 2: Score Difference Transformation
Lemma 3: Score Change Matrix Density
...and 7 more

General Identifiability and Achievability for Causal Representation Learning

TL;DR

Abstract

General Identifiability and Achievability for Causal Representation Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (17)