Table of Contents
Fetching ...

Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning

Aapo Hyvarinen, Hiroaki Sasaki, Richard E. Turner

TL;DR

This work introduces a general framework for nonlinear ICA built on auxiliary variables that modulate the latent sources, enabling identifiability beyond i.i.d. data. It unifies and extends prior temporally-based identifiability results (TCL, PCL) by allowing a wide range of auxiliary variables, including time, history, and class labels, and provides a practical, consistency-proven learning algorithm via contrastive logistic regression. Theoretical contributions center on conditional exponentiality and two main identifiability theorems, with distinctions based on the exponential-family order. Simulations demonstrate TCL-like performance without data segmentation and show the framework's capacity to incorporate nonstationarity, temporal dependencies, and supervised signals. Overall, the paper offers a versatile, principled path for recovering latent nonlinear components using auxiliary information, with broad applicability to self-supervised and supervised learning contexts.

Abstract

Nonlinear ICA is a fundamental problem for unsupervised representation learning, emphasizing the capacity to recover the underlying latent variables generating the data (i.e., identifiability). Recently, the very first identifiability proofs for nonlinear ICA have been proposed, leveraging the temporal structure of the independent components. Here, we propose a general framework for nonlinear ICA, which, as a special case, can make use of temporal structure. It is based on augmenting the data by an auxiliary variable, such as the time index, the history of the time series, or any other available information. We propose to learn nonlinear ICA by discriminating between true augmented data, or data in which the auxiliary variable has been randomized. This enables the framework to be implemented algorithmically through logistic regression, possibly in a neural network. We provide a comprehensive proof of the identifiability of the model as well as the consistency of our estimation method. The approach not only provides a general theoretical framework combining and generalizing previously proposed nonlinear ICA models and algorithms, but also brings practical advantages.

Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning

TL;DR

This work introduces a general framework for nonlinear ICA built on auxiliary variables that modulate the latent sources, enabling identifiability beyond i.i.d. data. It unifies and extends prior temporally-based identifiability results (TCL, PCL) by allowing a wide range of auxiliary variables, including time, history, and class labels, and provides a practical, consistency-proven learning algorithm via contrastive logistic regression. Theoretical contributions center on conditional exponentiality and two main identifiability theorems, with distinctions based on the exponential-family order. Simulations demonstrate TCL-like performance without data segmentation and show the framework's capacity to incorporate nonstationarity, temporal dependencies, and supervised signals. Overall, the paper offers a versatile, principled path for recovering latent nonlinear components using auxiliary information, with broad applicability to self-supervised and supervised learning contexts.

Abstract

Nonlinear ICA is a fundamental problem for unsupervised representation learning, emphasizing the capacity to recover the underlying latent variables generating the data (i.e., identifiability). Recently, the very first identifiability proofs for nonlinear ICA have been proposed, leveraging the temporal structure of the independent components. Here, we propose a general framework for nonlinear ICA, which, as a special case, can make use of temporal structure. It is based on augmenting the data by an auxiliary variable, such as the time index, the history of the time series, or any other available information. We propose to learn nonlinear ICA by discriminating between true augmented data, or data in which the auxiliary variable has been randomized. This enables the framework to be implemented algorithmically through logistic regression, possibly in a neural network. We provide a comprehensive proof of the identifiability of the model as well as the consistency of our estimation method. The approach not only provides a general theoretical framework combining and generalizing previously proposed nonlinear ICA models and algorithms, but also brings practical advantages.

Paper Structure

This paper contains 24 sections, 3 theorems, 33 equations, 1 figure.

Key Result

Theorem 1

Assume Then, in the limit of infinite data, $\mathbf{h}$ in the regression function provides a consistent estimator of demixing in the nonlinear ICA model: The functions (hidden units) $h_i(\mathbf{x})$ give the independent components, up to scalar (component-wise) invertible transformations.

Figures (1)

  • Figure 1: Performance measured by correlations between estimates and original quantities (see text). The non-conditionally-exponential case is given in (a) and the exponential family case in (b). "Proposed" is taking raw outputs from neural network learned by our new method, "Proposed with ICA" is adding final linear ICA, "TCL" is time-contrastive learning (with final linear ICA) given for comparison. In a), TCL was performed with $16, 64$, and $256$ time segments. In b), for each method, we report four cases, with $10, 50,100$, and $300$ time segments.

Theorems & Definitions (4)

  • Definition 1
  • Theorem 1
  • Theorem 2
  • Theorem 3