Continual Learning of Nonlinear Independent Representations
Boyang Sun, Ignavier Ng, Guangyi Chen, Yifan Shen, Qirong Ho, Kun Zhang
TL;DR
This work tackles the challenge of learning identifiable representations when distribution shifts arrive sequentially. It develops a theoretical framework showing that identifiability in nonlinear ICA improves from subspace to component-wise as more distributions are observed, with $n_s+1$ and $2n_s+1$ distributions sufficing for subspace and component-wise identifiability, respectively. The authors propose Continual Causal Representation Learning (CCRL) using a VAE with a flow-based mapping and Gradient Episodic Memory (GEM) to preserve past domains, achieving performance close to jointly trained nonlinear ICA across multiple offline distributions. Empirically, identifiability improves with more domains, but new domains can variably affect partial latent variables; memory mechanisms help stabilize learning. Overall, the approach demonstrates practical CCRL by leveraging sequential distribution changes to refine causal representations, with implications for robust transferability and continual reasoning in changing environments.
Abstract
Identifying the causal relations between interested variables plays a pivotal role in representation learning as it provides deep insights into the dataset. Identifiability, as the central theme of this approach, normally hinges on leveraging data from multiple distributions (intervention, distribution shift, time series, etc.). Despite the exciting development in this field, a practical but often overlooked problem is: what if those distribution shifts happen sequentially? In contrast, any intelligence possesses the capacity to abstract and refine learned knowledge sequentially -- lifelong learning. In this paper, with a particular focus on the nonlinear independent component analysis (ICA) framework, we move one step forward toward the question of enabling models to learn meaningful (identifiable) representations in a sequential manner, termed continual causal representation learning. We theoretically demonstrate that model identifiability progresses from a subspace level to a component-wise level as the number of distributions increases. Empirically, we show that our method achieves performance comparable to nonlinear ICA methods trained jointly on multiple offline distributions and, surprisingly, the incoming new distribution does not necessarily benefit the identification of all latent variables.
