Table of Contents
Fetching ...

United We Pretrain, Divided We Fail! Representation Learning for Time Series by Pretraining on 75 Datasets at Once

Maurice Kraus, Felix Divo, David Steinmann, Devendra Singh Dhami, Kristian Kersting

TL;DR

The paper tackles the challenge that pretraining on time-series data often fails to generalize across datasets. It introduces XIT, a self-supervised framework that learns a single encoder from 75 unlabeled time-series datasets by combining cross-dataset interpolation (XD-MixUp) with a soft interpolation contextual contrasting loss (SICC) and temporal contrastive learning. Across extensive experiments, XIT yields transferable representations that outperform supervised training and several self-supervised baselines, especially in low-data regimes, and ablations confirm the necessity of each component. Latent-space analyses indicate that the learned representations organize into structured clusters without labels, supporting the practical impact of the approach for scalable, data-efficient time-series classification across diverse domains.

Abstract

In natural language processing and vision, pretraining is utilized to learn effective representations. Unfortunately, the success of pretraining does not easily carry over to time series due to potential mismatch between sources and target. Actually, common belief is that multi-dataset pretraining does not work for time series! Au contraire, we introduce a new self-supervised contrastive pretraining approach to learn one encoding from many unlabeled and diverse time series datasets, so that the single learned representation can then be reused in several target domains for, say, classification. Specifically, we propose the XD-MixUp interpolation method and the Soft Interpolation Contextual Contrasting (SICC) loss. Empirically, this outperforms both supervised training and other self-supervised pretraining methods when finetuning on low-data regimes. This disproves the common belief: We can actually learn from multiple time series datasets, even from 75 at once.

United We Pretrain, Divided We Fail! Representation Learning for Time Series by Pretraining on 75 Datasets at Once

TL;DR

The paper tackles the challenge that pretraining on time-series data often fails to generalize across datasets. It introduces XIT, a self-supervised framework that learns a single encoder from 75 unlabeled time-series datasets by combining cross-dataset interpolation (XD-MixUp) with a soft interpolation contextual contrasting loss (SICC) and temporal contrastive learning. Across extensive experiments, XIT yields transferable representations that outperform supervised training and several self-supervised baselines, especially in low-data regimes, and ablations confirm the necessity of each component. Latent-space analyses indicate that the learned representations organize into structured clusters without labels, supporting the practical impact of the approach for scalable, data-efficient time-series classification across diverse domains.

Abstract

In natural language processing and vision, pretraining is utilized to learn effective representations. Unfortunately, the success of pretraining does not easily carry over to time series due to potential mismatch between sources and target. Actually, common belief is that multi-dataset pretraining does not work for time series! Au contraire, we introduce a new self-supervised contrastive pretraining approach to learn one encoding from many unlabeled and diverse time series datasets, so that the single learned representation can then be reused in several target domains for, say, classification. Specifically, we propose the XD-MixUp interpolation method and the Soft Interpolation Contextual Contrasting (SICC) loss. Empirically, this outperforms both supervised training and other self-supervised pretraining methods when finetuning on low-data regimes. This disproves the common belief: We can actually learn from multiple time series datasets, even from 75 at once.
Paper Structure (17 sections, 6 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 17 sections, 6 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: The core idea of our method XIT is to learn a single encoder from multiple datasets. The resulting representation can then be used to train classifiers on datasets seen during the pretraining phase and to be transferred to entirely new ones.
  • Figure 2: Our proposed XIT architecture. From two time series of a mini-batch, we generate a randomly interpolated variant that gets augmented twice and projected several times along with the original time series. Eventually, we compute two losses to define the overall pretraining objective.
  • Figure 3: This box plot shows the transfer surplus over supervised training measured in Macro F1 score difference after pretraining XIT on increasingly large subsets of the UCR repository with three folds each. Higher is better.
  • Figure 4: Low-dimensional embedding visualization from a single pretrained XIT model trained on 75 UCR datasets, evaluated on three hold-out datasets. Furthermore, we include the DBI to support our visual observation further. Lower DBI $\rightarrow$ more separated clusters.