Table of Contents
Fetching ...

PFML: Self-Supervised Learning of Time-Series Data Without Representation Collapse

Einari Vaaras, Manu Airaksinen, Okko Räsänen

TL;DR

PFML tackles representation collapse in time-series self-supervised learning by reframing pre-training as predicting statistical functionals of masked latent embeddings rather than reconstructing inputs. By framing frames, encoding them, masking latent embeddings, and predicting a diverse set of functionals across frames, PFML reduces task complexity and preserves variance in representations. Empirical results across infant IMU data, speech, and EEG show PFML outperforms MAE and TS2Vec and matches data2vec while avoiding collapse, with clear advantages in ease of application to new domains. The approach offers a practical, modality-agnostic avenue for robust SSL in time-series, with potential extensions to other domains like images where functionals can summarize patch-level information.

Abstract

Self-supervised learning (SSL) is a data-driven learning approach that utilizes the innate structure of the data to guide the learning process. In contrast to supervised learning, which depends on external labels, SSL utilizes the inherent characteristics of the data to produce its own supervisory signal. However, one frequent issue with SSL methods is representation collapse, where the model outputs a constant input-invariant feature representation. This issue hinders the potential application of SSL methods to new data modalities, as trying to avoid representation collapse wastes researchers' time and effort. This paper introduces a novel SSL algorithm for time-series data called Prediction of Functionals from Masked Latents (PFML). Instead of predicting masked input signals or their latent representations directly, PFML operates by predicting statistical functionals of the input signal corresponding to masked embeddings, given a sequence of unmasked embeddings. The algorithm is designed to avoid representation collapse, rendering it straightforwardly applicable to different time-series data domains, such as novel sensor modalities in clinical data. We demonstrate the effectiveness of PFML through complex, real-life classification tasks across three different data modalities: infant posture and movement classification from multi-sensor inertial measurement unit data, emotion recognition from speech data, and sleep stage classification from EEG data. The results show that PFML is superior to a conceptually similar SSL method and a contrastive learning-based SSL method. Additionally, PFML is on par with the current state-of-the-art SSL method, while also being conceptually simpler and without suffering from representation collapse.

PFML: Self-Supervised Learning of Time-Series Data Without Representation Collapse

TL;DR

PFML tackles representation collapse in time-series self-supervised learning by reframing pre-training as predicting statistical functionals of masked latent embeddings rather than reconstructing inputs. By framing frames, encoding them, masking latent embeddings, and predicting a diverse set of functionals across frames, PFML reduces task complexity and preserves variance in representations. Empirical results across infant IMU data, speech, and EEG show PFML outperforms MAE and TS2Vec and matches data2vec while avoiding collapse, with clear advantages in ease of application to new domains. The approach offers a practical, modality-agnostic avenue for robust SSL in time-series, with potential extensions to other domains like images where functionals can summarize patch-level information.

Abstract

Self-supervised learning (SSL) is a data-driven learning approach that utilizes the innate structure of the data to guide the learning process. In contrast to supervised learning, which depends on external labels, SSL utilizes the inherent characteristics of the data to produce its own supervisory signal. However, one frequent issue with SSL methods is representation collapse, where the model outputs a constant input-invariant feature representation. This issue hinders the potential application of SSL methods to new data modalities, as trying to avoid representation collapse wastes researchers' time and effort. This paper introduces a novel SSL algorithm for time-series data called Prediction of Functionals from Masked Latents (PFML). Instead of predicting masked input signals or their latent representations directly, PFML operates by predicting statistical functionals of the input signal corresponding to masked embeddings, given a sequence of unmasked embeddings. The algorithm is designed to avoid representation collapse, rendering it straightforwardly applicable to different time-series data domains, such as novel sensor modalities in clinical data. We demonstrate the effectiveness of PFML through complex, real-life classification tasks across three different data modalities: infant posture and movement classification from multi-sensor inertial measurement unit data, emotion recognition from speech data, and sleep stage classification from EEG data. The results show that PFML is superior to a conceptually similar SSL method and a contrastive learning-based SSL method. Additionally, PFML is on par with the current state-of-the-art SSL method, while also being conceptually simpler and without suffering from representation collapse.

Paper Structure

This paper contains 16 sections, 2 equations, 1 figure, 4 tables.

Figures (1)

  • Figure 1: An overview of the PFML pre-training pipeline. Note that in the figure, the input signal has only a single channel, whereas PFML can also be applied to multi-channel time-series data.