Table of Contents
Fetching ...

Guidelines for Augmentation Selection in Contrastive Learning for Time Series Classification

Ziyu Liu, Azadeh Alavi, Minyi Li, Xiang Zhang

TL;DR

The paper tackles the critical yet underexplored problem of choosing augmentations for time-series contrastive learning. It develops a principled trend-seasonality framework that maps dataset characteristics to effective augmentations, validated on 12 synthetic and 6 real-world datasets, achieving Recall@3 of $0.667$ on average and outperforming random and popularity baselines. By combining signal decomposition (including STL) with a synthetic twin dataset benchmarking, the approach yields a practical augmentation recommendation system (TS-ARM) that adapts to dataset properties and demonstrates strong real-world utility. The work provides actionable guidance and open-source tooling for deploying principled augmentation strategies in time-series contrastive learning across diverse domains.

Abstract

Self-supervised contrastive learning has become a key technique in deep learning, particularly in time series analysis, due to its ability to learn meaningful representations without explicit supervision. Augmentation is a critical component in contrastive learning, where different augmentations can dramatically impact performance, sometimes influencing accuracy by over 30%. However, the selection of augmentations is predominantly empirical which can be suboptimal, or grid searching that is time-consuming. In this paper, we establish a principled framework for selecting augmentations based on dataset characteristics such as trend and seasonality. Specifically, we construct 12 synthetic datasets incorporating trend, seasonality, and integration weights. We then evaluate the effectiveness of 8 different augmentations across these synthetic datasets, thereby inducing generalizable associations between time series characteristics and augmentation efficiency. Additionally, we evaluated the induced associations across 6 real-world datasets encompassing domains such as activity recognition, disease diagnosis, traffic monitoring, electricity usage, mechanical fault prognosis, and finance. These real-world datasets are diverse, covering a range from 1 to 12 channels, 2 to 10 classes, sequence lengths of 14 to 1280, and data frequencies from 250 Hz to daily intervals. The experimental results show that our proposed trend-seasonality-based augmentation recommendation algorithm can accurately identify the effective augmentations for a given time series dataset, achieving an average Recall@3 of 0.667, outperforming baselines. Our work provides guidance for studies employing contrastive learning in time series analysis, with wide-ranging applications. All the code, datasets, and analysis results will be released at https://github.com/DL4mHealth/TS-Contrastive-Augmentation-Recommendation.

Guidelines for Augmentation Selection in Contrastive Learning for Time Series Classification

TL;DR

The paper tackles the critical yet underexplored problem of choosing augmentations for time-series contrastive learning. It develops a principled trend-seasonality framework that maps dataset characteristics to effective augmentations, validated on 12 synthetic and 6 real-world datasets, achieving Recall@3 of on average and outperforming random and popularity baselines. By combining signal decomposition (including STL) with a synthetic twin dataset benchmarking, the approach yields a practical augmentation recommendation system (TS-ARM) that adapts to dataset properties and demonstrates strong real-world utility. The work provides actionable guidance and open-source tooling for deploying principled augmentation strategies in time-series contrastive learning across diverse domains.

Abstract

Self-supervised contrastive learning has become a key technique in deep learning, particularly in time series analysis, due to its ability to learn meaningful representations without explicit supervision. Augmentation is a critical component in contrastive learning, where different augmentations can dramatically impact performance, sometimes influencing accuracy by over 30%. However, the selection of augmentations is predominantly empirical which can be suboptimal, or grid searching that is time-consuming. In this paper, we establish a principled framework for selecting augmentations based on dataset characteristics such as trend and seasonality. Specifically, we construct 12 synthetic datasets incorporating trend, seasonality, and integration weights. We then evaluate the effectiveness of 8 different augmentations across these synthetic datasets, thereby inducing generalizable associations between time series characteristics and augmentation efficiency. Additionally, we evaluated the induced associations across 6 real-world datasets encompassing domains such as activity recognition, disease diagnosis, traffic monitoring, electricity usage, mechanical fault prognosis, and finance. These real-world datasets are diverse, covering a range from 1 to 12 channels, 2 to 10 classes, sequence lengths of 14 to 1280, and data frequencies from 250 Hz to daily intervals. The experimental results show that our proposed trend-seasonality-based augmentation recommendation algorithm can accurately identify the effective augmentations for a given time series dataset, achieving an average Recall@3 of 0.667, outperforming baselines. Our work provides guidance for studies employing contrastive learning in time series analysis, with wide-ranging applications. All the code, datasets, and analysis results will be released at https://github.com/DL4mHealth/TS-Contrastive-Augmentation-Recommendation.
Paper Structure (30 sections, 12 equations, 11 figures, 12 tables)

This paper contains 30 sections, 12 equations, 11 figures, 12 tables.

Figures (11)

  • Figure 1: Augment selection can affect model performance dramatically (real-world fault detection dataset).
  • Figure 1: Framework for contrastive learning from liu2023self. Pipeline of self-supervised contrastive learning which is composed of three stages. a. The pre-training receives unlabelled time series sample $\bm{x}_i$ as anchor sample, the augmented sample $\bm{x}'_i$ as positive sample while a different sample $\bm{x}_j$ as negative sample. The $\bm{h}_i$, $\bm{h}'_i$ and $\bm{h}_j$ denotes learned embedding of the original sample $\bm{x}_i$, positive pair $\bm{x}'_i$, and negative pair $\bm{x}_j$, respectively. A contrastive loss is calculated based on the distance among embeddings of samples, which is used to update the encoder through backpropagation. b. The well-trained encoder will be inherited by the fine-tuning stage which receives a labeled sample and makes a prediction through a downstream classifier. A standard supervised loss function (e.g., cross-entropy) will be used to update the encoder and/or classifier. c. The testing stage makes prediction based on the learned embedding $\bm{h}_{test}$ of an unseen test sample $\bm{x}_{\textrm{test}}$.
  • Figure 2: Study schematic
  • Figure 2: Examples of augmentation methods used in benchmarking.
  • Figure 3: Generating synthetic datasets and benchmarking the augmentations. Based on two trends and two seasonalities, we have 4 dataset groups. In each dataset group, we generate 3 datasets based on the values of integration weights. Finally, we rank the augmentations for each dataset.
  • ...and 6 more figures