Table of Contents
Fetching ...

Temporally Disentangled Representation Learning under Unknown Nonstationarity

Xiangchen Song, Weiran Yao, Yewen Fan, Xinshuai Dong, Guangyi Chen, Juan Carlos Niebles, Eric Xing, Kun Zhang

TL;DR

This work tackles unsupervised learning of latent causal representations from nonstationary time series with time-delayed influences, where domain shifts are unobserved. It develops identifiability theory by combining nonlinear ICA with Markov-switching nonstationarity and then introduces NCTRL, a three-component framework consisting of an Autoregressive Hidden Markov Module, a Prior Network, and an Encoder-Decoder, to recover time-delayed latent causal variables and their relations solely from observed data. The approach is supported by theoretical identifiability guarantees under mild conditions and demonstrates strong empirical performance on synthetic and real video datasets, surpassing baselines that cannot adequately exploit nonstationarity. This has practical implications for learning interpretable, causally meaningful latent factors in complex sequential data without requiring labeled domain information. The methods enable more transparent downstream analysis and decision-making in domains such as video understanding and behavior analysis.

Abstract

In unsupervised causal representation learning for sequential data with time-delayed latent causal influences, strong identifiability results for the disentanglement of causally-related latent variables have been established in stationary settings by leveraging temporal structure. However, in nonstationary setting, existing work only partially addressed the problem by either utilizing observed auxiliary variables (e.g., class labels and/or domain indexes) as side information or assuming simplified latent causal dynamics. Both constrain the method to a limited range of scenarios. In this study, we further explored the Markov Assumption under time-delayed causally related process in nonstationary setting and showed that under mild conditions, the independent latent components can be recovered from their nonlinear mixture up to a permutation and a component-wise transformation, without the observation of auxiliary variables. We then introduce NCTRL, a principled estimation framework, to reconstruct time-delayed latent causal variables and identify their relations from measured sequential data only. Empirical evaluations demonstrated the reliable identification of time-delayed latent causal influences, with our methodology substantially outperforming existing baselines that fail to exploit the nonstationarity adequately and then, consequently, cannot distinguish distribution shifts.

Temporally Disentangled Representation Learning under Unknown Nonstationarity

TL;DR

This work tackles unsupervised learning of latent causal representations from nonstationary time series with time-delayed influences, where domain shifts are unobserved. It develops identifiability theory by combining nonlinear ICA with Markov-switching nonstationarity and then introduces NCTRL, a three-component framework consisting of an Autoregressive Hidden Markov Module, a Prior Network, and an Encoder-Decoder, to recover time-delayed latent causal variables and their relations solely from observed data. The approach is supported by theoretical identifiability guarantees under mild conditions and demonstrates strong empirical performance on synthetic and real video datasets, surpassing baselines that cannot adequately exploit nonstationarity. This has practical implications for learning interpretable, causally meaningful latent factors in complex sequential data without requiring labeled domain information. The methods enable more transparent downstream analysis and decision-making in domains such as video understanding and behavior analysis.

Abstract

In unsupervised causal representation learning for sequential data with time-delayed latent causal influences, strong identifiability results for the disentanglement of causally-related latent variables have been established in stationary settings by leveraging temporal structure. However, in nonstationary setting, existing work only partially addressed the problem by either utilizing observed auxiliary variables (e.g., class labels and/or domain indexes) as side information or assuming simplified latent causal dynamics. Both constrain the method to a limited range of scenarios. In this study, we further explored the Markov Assumption under time-delayed causally related process in nonstationary setting and showed that under mild conditions, the independent latent components can be recovered from their nonlinear mixture up to a permutation and a component-wise transformation, without the observation of auxiliary variables. We then introduce NCTRL, a principled estimation framework, to reconstruct time-delayed latent causal variables and identify their relations from measured sequential data only. Empirical evaluations demonstrated the reliable identification of time-delayed latent causal influences, with our methodology substantially outperforming existing baselines that fail to exploit the nonstationarity adequately and then, consequently, cannot distinguish distribution shifts.
Paper Structure (43 sections, 5 theorems, 46 equations, 5 figures, 7 tables)

This paper contains 43 sections, 5 theorems, 46 equations, 5 figures, 7 tables.

Key Result

Lemma 1

(Theorem 3.1 in balsells-rodas2023on ) Define the following first-order Markov switching model family under non-linaer Gaussian families with the following assumptions held: where $\bm{m}(\mathbf{x}_{t-1}, c)$ and $\bm{\Sigma}(\mathbf{x}_{t-1}, c)$ are non-linear with respect to $\mathbf{z}_{t-1}$ and denote the mean and covariance matrix of the Gaussian distribution, and $\mathcal{C}$ is an inde

Figures (5)

  • Figure 1: Graphical models for different settings in causally related time-delayed time-series data with a visual illustration. (a) is a stationary setting in which the transition function $\mathbf{z}_{t+1} = f_z(\mathbf{z}_t)$ remains universally the same. (b) is the setting widely explored in existing work, in which the transition function $f_{z}$ changes according to different domains (denoted as $c_t$), and all these domain indices are observed. (c) capture the unobserved domain indices by introducing a Markov chain on $c_t$. (d) is a more general form to model the time-series data in this work. It allows nonstationary settings and it does not require the domain indices to be observed. In all cases, the mapping from $\mathbf{z}_t$ to $\mathbf{x}_t$ is deterministic, which is indicated by a gray arrow.
  • Figure 2: Graphical model used in Lemma 1, which only considers $c_t$ and $\mathbf{x}_t$.
  • Figure 3: Illustration of NCTRL with (1) Autoregressive Hidden Markov Module, (2) Prior Network, and (3) Encoder-Decoder Module.
  • Figure 4: Modified Cartpole results: (a) MCC for causally-related factors; (b) scatterplots between estimated and true factors; and (c) latent traversal on a fixed video frame
  • Figure 5: Result visualization of MoSeq dataset. (Active, Inactive) show two representative video frames for the active and inactive phases and (Independent Components) visualize the discovered independent components with corresponding phases tagged with different colors.

Theorems & Definitions (10)

  • Definition 1: Identifiable Latent Causal Processes
  • Definition 2: Definition 3.1 in balsells-rodas2023on
  • Lemma 1
  • Definition 3: volume-preserving mapping
  • Theorem 1: identifiability of nonstationary hidden states
  • Theorem 2
  • Theorem A.1: identifiability of nonstationary hidden states
  • proof
  • Theorem A.2
  • proof