Table of Contents
Fetching ...

Causal Representation Learning in Temporal Data via Single-Parent Decoding

Philippe Brouillard, Sébastien Lachapelle, Julia Kaltenborn, Yaniv Gurwicz, Dhanya Sridhar, Alexandre Drouin, Peer Nowack, Jakob Runge, David Rolnick

TL;DR

This work considers a temporal model with a sparsity assumption, namely single-parent decoding, that simultaneously learns the underlying latents and a causal graph over them and proposes a differentiable method, Causal Discovery with Single-parent Decoding (CDSD), that simultaneously learns the underlying latents and a causal graph over them.

Abstract

Scientific research often seeks to understand the causal structure underlying high-level variables in a system. For example, climate scientists study how phenomena, such as El Niño, affect other climate processes at remote locations across the globe. However, scientists typically collect low-level measurements, such as geographically distributed temperature readings. From these, one needs to learn both a mapping to causally-relevant latent variables, such as a high-level representation of the El Niño phenomenon and other processes, as well as the causal model over them. The challenge is that this task, called causal representation learning, is highly underdetermined from observational data alone, requiring other constraints during learning to resolve the indeterminacies. In this work, we consider a temporal model with a sparsity assumption, namely single-parent decoding: each observed low-level variable is only affected by a single latent variable. Such an assumption is reasonable in many scientific applications that require finding groups of low-level variables, such as extracting regions from geographically gridded measurement data in climate research or capturing brain regions from neural activity data. We demonstrate the identifiability of the resulting model and propose a differentiable method, Causal Discovery with Single-parent Decoding (CDSD), that simultaneously learns the underlying latents and a causal graph over them. We assess the validity of our theoretical results using simulated data and showcase the practical validity of our method in an application to real-world data from the climate science field.

Causal Representation Learning in Temporal Data via Single-Parent Decoding

TL;DR

This work considers a temporal model with a sparsity assumption, namely single-parent decoding, that simultaneously learns the underlying latents and a causal graph over them and proposes a differentiable method, Causal Discovery with Single-parent Decoding (CDSD), that simultaneously learns the underlying latents and a causal graph over them.

Abstract

Scientific research often seeks to understand the causal structure underlying high-level variables in a system. For example, climate scientists study how phenomena, such as El Niño, affect other climate processes at remote locations across the globe. However, scientists typically collect low-level measurements, such as geographically distributed temperature readings. From these, one needs to learn both a mapping to causally-relevant latent variables, such as a high-level representation of the El Niño phenomenon and other processes, as well as the causal model over them. The challenge is that this task, called causal representation learning, is highly underdetermined from observational data alone, requiring other constraints during learning to resolve the indeterminacies. In this work, we consider a temporal model with a sparsity assumption, namely single-parent decoding: each observed low-level variable is only affected by a single latent variable. Such an assumption is reasonable in many scientific applications that require finding groups of low-level variables, such as extracting regions from geographically gridded measurement data in climate research or capturing brain regions from neural activity data. We demonstrate the identifiability of the resulting model and propose a differentiable method, Causal Discovery with Single-parent Decoding (CDSD), that simultaneously learns the underlying latents and a causal graph over them. We assess the validity of our theoretical results using simulated data and showcase the practical validity of our method in an application to real-world data from the climate science field.

Paper Structure

This paper contains 43 sections, 5 theorems, 59 equations, 17 figures, 3 tables.

Key Result

Proposition 0

Assume we have two models $p(\bm{x}^{\leq T}, \bm{z}^{\leq T})$ and $\hat{p}(\bm{x}^{\leq T}, \hat{\bm{z}}^{\leq T})$ as specified in sec:gen_mod with parameters $({\bm{g}}, {\bm{f}}, G, \sigma^2)$ and $(\hat{{\bm{g}}}, \hat{{\bm{f}}}, \hat{G}, \hat{\sigma}^2)$, respectively. Assume further that ${\

Figures (17)

  • Figure 1: In the proposed generative model, the variables $\bm{z}$ are latent and $\bm{x}$ are observable variables. $G^k$ represents the connections between the latent variables, and $F$ the connections between the latents and the observables (dashed lines). The colors represent the different groups. For clarity, we illustrate here connections only up to $G^1$, but our method also leverages connections of higher order.
  • Figure 2: Comparison of Varimax-PCMCI and CDSD in terms SHD (lower is better) on simulated datasets with linear decoding and both a) linear and b) nonlinear latent dynamics.
  • Figure 3: Comparison of CDSD and Varimax-PCMCI in terms of MCC (higher is better) and SHD (lower is better) on simulated datasets with linear dynamics and nonlinear decoding.
  • Figure 4: Comparison of CDSD to DMS and iVAE in terms of MCC.
  • Figure 5: Overview of the climate science results for CDSD. a) Segmentation of the Earth's surface according to $W$. The groups are colored and numbered based on the latent variable to which they are related. b) Adjacency matrices for latent dynamic graphs $G^1, \ldots, G^5$, shown as $d_z \times d_z$ heatmaps. c) Subgraph of $G^1$ showing the learned causal relationships between known ENSO-related regions.
  • ...and 12 more figures

Theorems & Definitions (8)

  • Proposition 0: Identifiability of $\vf$ and $p(\vz^{\leq T})$ up to diffeomorphism
  • Proposition 0: Identifying latents of $\vf$
  • Lemma 1: Denoising $\bm{x}$
  • proof
  • Proposition 1: Identifiability of $\vf$ and $p(\vz^{\leq T})$ up to diffeomorphism
  • proof
  • Proposition 1: Identifying latents of $\vf$
  • proof