Table of Contents
Fetching ...

FreRA: A Frequency-Refined Augmentation for Contrastive Learning on Time Series Classification

Tian Tian, Chunyan Miao, Hangwei Qian

TL;DR

FreRA tackles augmentation design in time series contrastive learning by shifting to the frequency domain to automatically refine Fourier components. It learns a single vector $\\boldsymbol{s}$ to separate critical versus unimportant components and applies semantic-aware identity modification to the critical set while semantically agnostic distortion to the unimportant set, yielding semantic-preserving views. The approach is backed by theoretical results showing MI preservation under mild assumptions and a plug-and-play integration with standard contrastive losses like InfoNCE. Empirically, FreRA achieves state-of-the-art or competitive performance across 135 datasets, including time-series classification, anomaly detection, and transfer learning tasks, with strong generalization and robustness to hyperparameters.

Abstract

Contrastive learning has emerged as a competent approach for unsupervised representation learning. However, the design of an optimal augmentation strategy, although crucial for contrastive learning, is less explored for time series classification tasks. Existing predefined time-domain augmentation methods are primarily adopted from vision and are not specific to time series data. Consequently, this cross-modality incompatibility may distort the semantically relevant information of time series by introducing mismatched patterns into the data. To address this limitation, we present a novel perspective from the frequency domain and identify three advantages for downstream classification: global, independent, and compact. To fully utilize the three properties, we propose the lightweight yet effective Frequency Refined Augmentation (FreRA) tailored for time series contrastive learning on classification tasks, which can be seamlessly integrated with contrastive learning frameworks in a plug-and-play manner. Specifically, FreRA automatically separates critical and unimportant frequency components. Accordingly, we propose semantic-aware Identity Modification and semantic-agnostic Self-adaptive Modification to protect semantically relevant information in the critical frequency components and infuse variance into the unimportant ones respectively. Theoretically, we prove that FreRA generates semantic-preserving views. Empirically, we conduct extensive experiments on two benchmark datasets, including UCR and UEA archives, as well as five large-scale datasets on diverse applications. FreRA consistently outperforms ten leading baselines on time series classification, anomaly detection, and transfer learning tasks, demonstrating superior capabilities in contrastive representation learning and generalization in transfer learning scenarios across diverse datasets.

FreRA: A Frequency-Refined Augmentation for Contrastive Learning on Time Series Classification

TL;DR

FreRA tackles augmentation design in time series contrastive learning by shifting to the frequency domain to automatically refine Fourier components. It learns a single vector to separate critical versus unimportant components and applies semantic-aware identity modification to the critical set while semantically agnostic distortion to the unimportant set, yielding semantic-preserving views. The approach is backed by theoretical results showing MI preservation under mild assumptions and a plug-and-play integration with standard contrastive losses like InfoNCE. Empirically, FreRA achieves state-of-the-art or competitive performance across 135 datasets, including time-series classification, anomaly detection, and transfer learning tasks, with strong generalization and robustness to hyperparameters.

Abstract

Contrastive learning has emerged as a competent approach for unsupervised representation learning. However, the design of an optimal augmentation strategy, although crucial for contrastive learning, is less explored for time series classification tasks. Existing predefined time-domain augmentation methods are primarily adopted from vision and are not specific to time series data. Consequently, this cross-modality incompatibility may distort the semantically relevant information of time series by introducing mismatched patterns into the data. To address this limitation, we present a novel perspective from the frequency domain and identify three advantages for downstream classification: global, independent, and compact. To fully utilize the three properties, we propose the lightweight yet effective Frequency Refined Augmentation (FreRA) tailored for time series contrastive learning on classification tasks, which can be seamlessly integrated with contrastive learning frameworks in a plug-and-play manner. Specifically, FreRA automatically separates critical and unimportant frequency components. Accordingly, we propose semantic-aware Identity Modification and semantic-agnostic Self-adaptive Modification to protect semantically relevant information in the critical frequency components and infuse variance into the unimportant ones respectively. Theoretically, we prove that FreRA generates semantic-preserving views. Empirically, we conduct extensive experiments on two benchmark datasets, including UCR and UEA archives, as well as five large-scale datasets on diverse applications. FreRA consistently outperforms ten leading baselines on time series classification, anomaly detection, and transfer learning tasks, demonstrating superior capabilities in contrastive representation learning and generalization in transfer learning scenarios across diverse datasets.

Paper Structure

This paper contains 23 sections, 4 theorems, 12 equations, 7 figures, 10 tables.

Key Result

Proposition 1

(Conservation of Entropy) Let $\textsf{x}$ and $\textsf{x}_f$ be the random variables denoting the time series in the time domain and the frequency domain respectively, then we have $\text{H}(\textsf{x})=\text{H}(\textsf{x}_f)$.

Figures (7)

  • Figure 1: Our method (blue curve) achieves the highest MI between the views generated and the label, enabling better semantic preservation compared with SOTA. The semantically relevant information is well preserved to facilitate contrastive representation learning.
  • Figure 2: An overview of the proposed FreRA. The left-hand side presents the detailed design of FreRA: semantic-aware identity modification on critical components and semantic-agnostic self-adaptive modification on unimportant components are conducted in the frequency domain to maintain contextual information and infuse variance respectively. The matching colors between $\mathbf{s}$ and $\mathbf{w}_\text{dist}$ on unimportant components intend to illustrate the adaptive distortion. The independent manipulations in FreRA ensure the added variance does not impact the critical semantically relevant information. $X(m)$ and $x(n)$ represent the frequency domain and the time domain of time series respectively, where $m$ denotes the index of frequency component and $n$ denotes the timestamp index. As a plug-and-play component, FreRA can be jointly trained with any contrastive learning framework, as illustrated on the right-hand side. The contrastive learning model is pre-trained in the time domain. FreRA encourages the compactness of critical frequency components and the consistency of positive pairs' representations. In evaluation, a classifier is trained on top of the frozen pre-trained encoder to get predictions for downstream tasks.
  • Figure 3: Performance of FreRA on the 3 HAR datasets under varying $\lambda$, in comparison to their second-best baselines.
  • Figure 4: Take the UCIHAR dataset as an example, the energy in the frequency domain $E=\frac{1}{L} \sum_{m=0}^{L-1} \lvert X(m) \rvert ^2$ is mostly concentrated in a compact set of frequency components, whose frequency are the ten lowest. The solid line represents the average energy for the frequency components in the UCIHAR dataset, and the shaded area indicates the range.
  • Figure 5: We aim to achieve the intersection point pointed by the green arrow where $\text{MI}(\textsf{x}_{\text{crit}}; \textsf{x}) = \text{MI}(\textsf{x}; \textsf{y})$, meaning the critical frequency components keep and only keep the semantically relevant information. The linearity of $\text{MI}(\mathbf{w}_{\text{crit}} \odot \textsf{x}_f; \textsf{x}_f)$ is for illustration purposes only.
  • ...and 2 more figures

Theorems & Definitions (5)

  • Definition 1: Optimal View Generator
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4