Table of Contents
Fetching ...

Structural Disentanglement of Causal and Correlated Concepts

Qilong Zhao, Shiyu Wang, Zeeshan Memon, Yang Qiao, Guangji Bai, Bo Pan, Zhaohui Qin, Liang Zhao

TL;DR

This work addresses controllable data generation in settings where latent factors exhibit both causal and correlational dependencies. The authors introduce C$^2$VAE, a two-phase variational framework that (i) learns a structured latent space with a linear SCM over latent factors and a correlation mask linking latent factors to concepts, and (ii) enables two controllable generation modes via root factors that govern the causal relations and observed concepts. Key contributions include a joint ELBO with targeted regularizers, a correlation-aware prior, and an invertible mapping for reliable concept inversion, together with identifiability guarantees under iVAE assumptions. Empirical results across synthetic and real-world datasets show improved generation quality, stronger concept disentanglement, and faithful interventions compared to strong baselines, highlighting the practical impact for reliable, fine-grained controllable generation.

Abstract

Controllable data generation aims to synthesize data by specifying values for target concepts. Achieving this reliably requires modeling the underlying generative factors and their relationships. In real-world scenarios, these factors exhibit both causal and correlational dependencies, yet most existing methods model only part of this structure. We propose the Causal-Correlation Variational Autoencoder (C2VAE), a unified framework that jointly captures causal and correlational relationships among latent factors. C2VAE organizes the latent space into a structured graph, identifying a set of root causes that govern the generative processes. By optimizing only the root factors relevant to target concepts, the model enables efficient and faithful control. Experiments on synthetic and real-world datasets demonstrate that C2VAE improves generation quality, disentanglement, and intervention fidelity over existing baselines.

Structural Disentanglement of Causal and Correlated Concepts

TL;DR

This work addresses controllable data generation in settings where latent factors exhibit both causal and correlational dependencies. The authors introduce CVAE, a two-phase variational framework that (i) learns a structured latent space with a linear SCM over latent factors and a correlation mask linking latent factors to concepts, and (ii) enables two controllable generation modes via root factors that govern the causal relations and observed concepts. Key contributions include a joint ELBO with targeted regularizers, a correlation-aware prior, and an invertible mapping for reliable concept inversion, together with identifiability guarantees under iVAE assumptions. Empirical results across synthetic and real-world datasets show improved generation quality, stronger concept disentanglement, and faithful interventions compared to strong baselines, highlighting the practical impact for reliable, fine-grained controllable generation.

Abstract

Controllable data generation aims to synthesize data by specifying values for target concepts. Achieving this reliably requires modeling the underlying generative factors and their relationships. In real-world scenarios, these factors exhibit both causal and correlational dependencies, yet most existing methods model only part of this structure. We propose the Causal-Correlation Variational Autoencoder (C2VAE), a unified framework that jointly captures causal and correlational relationships among latent factors. C2VAE organizes the latent space into a structured graph, identifying a set of root causes that govern the generative processes. By optimizing only the root factors relevant to target concepts, the model enables efficient and faithful control. Experiments on synthetic and real-world datasets demonstrate that C2VAE improves generation quality, disentanglement, and intervention fidelity over existing baselines.
Paper Structure (29 sections, 2 theorems, 11 equations, 6 figures, 3 tables)

This paper contains 29 sections, 2 theorems, 11 equations, 6 figures, 3 tables.

Key Result

Theorem 1

Under the iVAE assumptions with $u\equiv y$, the latent vector $w$ is identifiable from $p(x,y)$ up to $\sim_{\textsc{iVAE}}$. In particular, when each $p(w_i\!\mid\!y)$ is modulated by sufficiently rich sufficient statistics (e.g., mean and variance) and the variability condition holds, the residua

Figures (6)

  • Figure 1: A light source illuminates a swinging pendulum, casting a shadow on the floor. Observers can report many concepts (e.g., Pendulum angle, Light position, Height, Projection, Shadow position, Shadow length, and Time), but these often overlap and only partially reflect the true generative factors. For example, neither Height nor Pendulum angle alone provides a complete description of the pendulum's motion. Existing causal generative models often struggle with such spurious or redundant relationships among observed concepts. We aim to uncover a compact set of underlying latent factors and the causal structure among them. In this example, $w_1$ and $w_2$ (shown in blue) are root factors—independent sources that drive all other latent factors and, through them, the observed concepts.
  • Figure 2: Overview of C$^2$VAE, which consists of a learning phase and a generation phase. During the learning phase, a Causal Layer infers the structural causal graph among factors $w$, while a Correlation Layer captures correlation between concepts $y$ and $w$. In the generation phase, we provide two complementary modes of control: ① Factor control: edit the root factors $w_{\text{root}}$ (a subset of factors $w$) to steer downstream factors, and thus the desired $x^{\star}$. ② Concept control: specify target concepts $y^{\star}$; a bijective mapping propagates these targets back to the corresponding root factors $w_{\text{root}}$, enabling the decoder to generate an image $x^{\star}$ with $y^{\star}$.
  • Figure 3: The learning process of the mask pooling layer of C$^2$VAE. Rows correspond to Pendulum angle, Light position, Time, Shadow length, and Shadow position, from the first to the last.
  • Figure 4: The learning process of the causal graph of C$^2$VAE. In each subfigure, from left to right and top to bottom, the slots correspond to $w_1$ through $w_5$ and then $z_1$ through $z_3$. Here, $w_1$ to $w_5$ each represent a targeted concept: Pendulum angle, Light position, Time, Shadow length and Shadow position; while $z_1$ to $z_3$ correspond to non-targeted factors.
  • Figure 5: ② Concept control on pendulum dataset by setting values for all concepts.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Theorem 1: identifiability of $w$
  • Proposition 2: Mask identifiability