Table of Contents
Fetching ...

Time-Series Contrastive Learning against False Negatives and Class Imbalance

Xiyuan Jin, Jing Wang, Lei Liu, Youfang Lin

TL;DR

This work tackles false negatives and class imbalance in time-series self-supervised contrastive learning under the InfoNCE framework. It analyzes the lower bounds of the losses $\mathcal{L}_{\text{uc}}$ and $\mathcal{L}_{\text{sc}}$ and introduces the SIP-LDL framework, which combines multiple-instances discrimination, instance graph convolution, and semi-supervised consistency classification to bridge unsupervised and supervised learning. Empirical results on HAR, Sleep-EDF, PhysioNet 2017, and TUSZ show state-of-the-art or competitive performance, with pronounced gains for minority classes and effective semi-supervised learning using as little as $10\%$ labeled data. The approach is readily integrable with existing TCL models and does not require substantial additional parameterization, making it practical for real-world physiological time-series applications.

Abstract

As an exemplary self-supervised approach for representation learning, time-series contrastive learning has exhibited remarkable advancements in contemporary research. While recent contrastive learning strategies have focused on how to construct appropriate positives and negatives, in this study, we conduct theoretical analysis and find they have overlooked the fundamental issues: false negatives and class imbalance inherent in the InfoNCE loss-based framework. Therefore, we introduce a straightforward modification grounded in the SimCLR framework, universally adaptable to models engaged in the instance discrimination task. By constructing instance graphs to facilitate interactive learning among instances, we emulate supervised contrastive learning via the multiple-instances discrimination task, mitigating the harmful impact of false negatives. Moreover, leveraging the graph structure and few-labeled data, we perform semi-supervised consistency classification and enhance the representative ability of minority classes. We compared our method with the most popular time-series contrastive learning methods on four real-world time-series datasets and demonstrated our significant advantages in overall performance.

Time-Series Contrastive Learning against False Negatives and Class Imbalance

TL;DR

This work tackles false negatives and class imbalance in time-series self-supervised contrastive learning under the InfoNCE framework. It analyzes the lower bounds of the losses and and introduces the SIP-LDL framework, which combines multiple-instances discrimination, instance graph convolution, and semi-supervised consistency classification to bridge unsupervised and supervised learning. Empirical results on HAR, Sleep-EDF, PhysioNet 2017, and TUSZ show state-of-the-art or competitive performance, with pronounced gains for minority classes and effective semi-supervised learning using as little as labeled data. The approach is readily integrable with existing TCL models and does not require substantial additional parameterization, making it practical for real-world physiological time-series applications.

Abstract

As an exemplary self-supervised approach for representation learning, time-series contrastive learning has exhibited remarkable advancements in contemporary research. While recent contrastive learning strategies have focused on how to construct appropriate positives and negatives, in this study, we conduct theoretical analysis and find they have overlooked the fundamental issues: false negatives and class imbalance inherent in the InfoNCE loss-based framework. Therefore, we introduce a straightforward modification grounded in the SimCLR framework, universally adaptable to models engaged in the instance discrimination task. By constructing instance graphs to facilitate interactive learning among instances, we emulate supervised contrastive learning via the multiple-instances discrimination task, mitigating the harmful impact of false negatives. Moreover, leveraging the graph structure and few-labeled data, we perform semi-supervised consistency classification and enhance the representative ability of minority classes. We compared our method with the most popular time-series contrastive learning methods on four real-world time-series datasets and demonstrated our significant advantages in overall performance.
Paper Structure (26 sections, 13 equations, 4 figures, 2 tables)

This paper contains 26 sections, 13 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The loss distributions of the majority and minority on the balanced/imbalanced dataset as the training progresses.
  • Figure 2: The overall architecture of the proposed SIP-LDL model.
  • Figure 3: Normalized accuracy performance patterns of different sleep stages, cardiac arrhythmia types, and seizure types. The total and each class's F1 scores were evaluated. The result of the method with the highest accuracy was recorded as 1. (a) Sleep-EDF dataset. (b) PhysioNet 2017 dataset, (c) TUSZ dataset.
  • Figure 4: Comparison between supervised training vs. SIP-LDL for different few-labeled data scenarios in terms of MF1.