Causal Temporal Representation Learning with Nonstationary Sparse Transition
Xiangchen Song, Zijian Li, Guangyi Chen, Yujia Zheng, Yewen Fan, Xinshuai Dong, Kun Zhang
TL;DR
This work tackles learning causal temporal representations from nonstationary sequences without observed domain indices. It develops identifiability theory showing that, under sparse transition constraints and sufficient variability, domain regimes can be recovered up to label swapping and latent causal processes can be identified up to permutation and component-wise transformations. The CtrlNS framework operationalizes these ideas using a sparse-transition module, a prior network, and a VAE-based encoder-decoder, yielding accurate recovery of distribution shifts and latent factors. Empirically, CtrlNS demonstrates strong identifiability and improved performance on synthetic data and weakly supervised action segmentation benchmarks, highlighting its practical potential for transparent, domain-aware modeling of nonstationary temporal data.
Abstract
Causal Temporal Representation Learning (Ctrl) methods aim to identify the temporal causal dynamics of complex nonstationary temporal sequences. Despite the success of existing Ctrl methods, they require either directly observing the domain variables or assuming a Markov prior on them. Such requirements limit the application of these methods in real-world scenarios when we do not have such prior knowledge of the domain variables. To address this problem, this work adopts a sparse transition assumption, aligned with intuitive human understanding, and presents identifiability results from a theoretical perspective. In particular, we explore under what conditions on the significance of the variability of the transitions we can build a model to identify the distribution shifts. Based on the theoretical result, we introduce a novel framework, Causal Temporal Representation Learning with Nonstationary Sparse Transition (CtrlNS), designed to leverage the constraints on transition sparsity and conditional independence to reliably identify both distribution shifts and latent factors. Our experimental evaluations on synthetic and real-world datasets demonstrate significant improvements over existing baselines, highlighting the effectiveness of our approach.
