Causal Representation Meets Stochastic Modeling under Generic Geometry
Jiaxu Ren, Yixin Wang, Biwei Huang
TL;DR
This work addresses identifiability for latent causal representations when latent dynamics are continuous-time stochastic point processes, such as Hawkes processes. It develops a theory based on weak convergence and algebraic geometry, showing necessary and sufficient identifiability conditions under generic non-invertible mixing, for both linear and generic nonlinear mappings, using cumulants and Veronese embeddings. It introduces MUTATE, a time-adaptive variational autoencoder with a Neural PSD module to recover latent stochastic dynamics from high-dimensional observations, and provides testable conditions via interventions on the kernel dynamics. Across simulations and empirical data, MUTATE demonstrates strong latent recovery and causal structure identification, enabling scientific inferences about systems like genomics mutational accumulation and neuron spike mechanisms.
Abstract
Learning meaningful causal representations from observations has emerged as a crucial task for facilitating machine learning applications and driving scientific discoveries in fields such as climate science, biology, and physics. This process involves disentangling high-level latent variables and their causal relationships from low-level observations. Previous work in this area that achieves identifiability typically focuses on cases where the observations are either i.i.d. or follow a latent discrete-time process. Nevertheless, many real-world settings require identifying latent variables that are continuous-time stochastic processes (e.g., multivariate point processes). To this end, we develop identifiable causal representation learning for continuous-time latent stochastic point processes. We study its identifiability by analyzing the geometry of the parameter space. Furthermore, we develop MUTATE, an identifiable variational autoencoder framework with a time-adaptive transition module to infer stochastic dynamics. Across simulated and empirical studies, we find that MUTATE can effectively answer scientific questions, such as the accumulation of mutations in genomics and the mechanisms driving neuron spike triggers in response to time-varying dynamics.
