Table of Contents
Fetching ...

Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA

Aapo Hyvarinen, Hiroshi Morioka

TL;DR

The paper introduces Time-Contrastive Learning (TCL), a discriminative, unsupervised method that leverages nonstationarity in time series to learn informative representations. It establishes a rigorous link between TCL and nonlinear ICA, showing that TCL followed by linear ICA identifies nonlinear sources up to monotone component-wise transformations, with full identifiability in a modulated Gaussian special case. The authors develop theory, extensions for dimension reduction and multiple nonlinearities, and validate the approach through simulations and resting-state MEG experiments, where TCL improves source recovery and reveals neuroscience-relevant networks. Overall, TCL provides a practical, theoretically principled framework for unsupervised feature learning in nonstationary data with strong identifiability guarantees for nonlinear ICA.

Abstract

Nonlinear independent component analysis (ICA) provides an appealing framework for unsupervised feature learning, but the models proposed so far are not identifiable. Here, we first propose a new intuitive principle of unsupervised deep learning from time series which uses the nonstationary structure of the data. Our learning principle, time-contrastive learning (TCL), finds a representation which allows optimal discrimination of time segments (windows). Surprisingly, we show how TCL can be related to a nonlinear ICA model, when ICA is redefined to include temporal nonstationarities. In particular, we show that TCL combined with linear ICA estimates the nonlinear ICA model up to point-wise transformations of the sources, and this solution is unique --- thus providing the first identifiability result for nonlinear ICA which is rigorous, constructive, as well as very general.

Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA

TL;DR

The paper introduces Time-Contrastive Learning (TCL), a discriminative, unsupervised method that leverages nonstationarity in time series to learn informative representations. It establishes a rigorous link between TCL and nonlinear ICA, showing that TCL followed by linear ICA identifies nonlinear sources up to monotone component-wise transformations, with full identifiability in a modulated Gaussian special case. The authors develop theory, extensions for dimension reduction and multiple nonlinearities, and validate the approach through simulations and resting-state MEG experiments, where TCL improves source recovery and reveals neuroscience-relevant networks. Overall, TCL provides a practical, theoretically principled framework for unsupervised feature learning in nonstationary data with strong identifiability guarantees for nonlinear ICA.

Abstract

Nonlinear independent component analysis (ICA) provides an appealing framework for unsupervised feature learning, but the models proposed so far are not identifiable. Here, we first propose a new intuitive principle of unsupervised deep learning from time series which uses the nonstationary structure of the data. Our learning principle, time-contrastive learning (TCL), finds a representation which allows optimal discrimination of time segments (windows). Surprisingly, we show how TCL can be related to a nonlinear ICA model, when ICA is redefined to include temporal nonstationarities. In particular, we show that TCL combined with linear ICA estimates the nonlinear ICA model up to point-wise transformations of the sources, and this solution is unique --- thus providing the first identifiability result for nonlinear ICA which is rigorous, constructive, as well as very general.

Paper Structure

This paper contains 18 sections, 3 theorems, 14 equations, 3 figures.

Key Result

Theorem 1

Assume the following: Then, after learning the parameter vector $\boldsymbol{\theta}$, the outputs of the feature extractor are equal to $q({\mathbf{\bm{s}}})=(q(s_1),q(s_2),\ldots,q(s_n))^T$ up to an invertible linear transformation. In other words, for some constant invertible matrix $\mathbf{\bm{A}} \in \mathbb R^{n \times n}$ and a constant vector $\mathbf{\bm{d}} \in \mathbb R^n$.

Figures (3)

  • Figure 1: An illustration of how we combine a new generative nonlinear ICA model with the new learning principle called time-contrastive learning (TCL). (A) The probabilistic generative model of nonlinear ICA, where the observed signals are given by a nonlinear transformation of source signals, which are mutually independent, and have segment-wise nonstationarity. (B) In TCL we train a feature extractor sensitive to the nonstationarity of the data by using a multinomial logistic regression which attempts to discriminate between the segments, labelling each data point with the segment label $1,\ldots,T$. The feature extractor and logistic regression together can be implemented by a conventional multi-layer perceptron.
  • Figure 2: Simulation on artificial data. a) Mean classification accuracies of the MLR simultaneously trained with the feature-MLP to implement TCL, with different settings of the number of layers $L$ and segments. Note that chance levels (dotted lines) change as a function of the number of segments (see text). The MLR achieved higher accuracy than chance level. b) Mean absolute correlation coefficients between the true $q(s)$ and the features learned by TCL (solid line) and, for comparison: nonstationarity-of-variance-based linear ICA (NSVICA, dashed line), kernel-based nonlinear ICA (kTDSEP, dotted line), and denoising autoencoder (DAE, dash-dot line). TCL has much higher correlations than DAE or kTDSEP, and in the nonlinear case ($L > 1$), higher than NSVICA.
  • Figure 3: Real MEG data. a) Classification accuracies of linear SMVs newly trained with task-session data to predict stimulation labels in task-sessions, with feature extractors trained in advance with resting-session data. Error bars give standard errors of the mean across ten repetitions. For TCL and DAE, accuracies are given for different numbers of layers $L$. Horizontal line shows the chance level (25%). b) Example of spatial patterns of nonstationary components learned by TCL. Each small panel corresponds to one spatial pattern with the measurement helmet seen from three different angles (left, back, right); red/yellow is positive and blue is negative. "L3" shows approximate total spatial pattern of one selected third-layer unit. "L2" shows the patterns of the three second-layer units maximally contributing to this L3 unit. "L1" shows, for each L2 unit, the two most strongly contributing first-layer units.

Theorems & Definitions (3)

  • Theorem 1
  • Corollary 1
  • Corollary 2