Time-Series Contrastive Learning against False Negatives and Class Imbalance
Xiyuan Jin, Jing Wang, Lei Liu, Youfang Lin
TL;DR
This work tackles false negatives and class imbalance in time-series self-supervised contrastive learning under the InfoNCE framework. It analyzes the lower bounds of the losses $\mathcal{L}_{\text{uc}}$ and $\mathcal{L}_{\text{sc}}$ and introduces the SIP-LDL framework, which combines multiple-instances discrimination, instance graph convolution, and semi-supervised consistency classification to bridge unsupervised and supervised learning. Empirical results on HAR, Sleep-EDF, PhysioNet 2017, and TUSZ show state-of-the-art or competitive performance, with pronounced gains for minority classes and effective semi-supervised learning using as little as $10\%$ labeled data. The approach is readily integrable with existing TCL models and does not require substantial additional parameterization, making it practical for real-world physiological time-series applications.
Abstract
As an exemplary self-supervised approach for representation learning, time-series contrastive learning has exhibited remarkable advancements in contemporary research. While recent contrastive learning strategies have focused on how to construct appropriate positives and negatives, in this study, we conduct theoretical analysis and find they have overlooked the fundamental issues: false negatives and class imbalance inherent in the InfoNCE loss-based framework. Therefore, we introduce a straightforward modification grounded in the SimCLR framework, universally adaptable to models engaged in the instance discrimination task. By constructing instance graphs to facilitate interactive learning among instances, we emulate supervised contrastive learning via the multiple-instances discrimination task, mitigating the harmful impact of false negatives. Moreover, leveraging the graph structure and few-labeled data, we perform semi-supervised consistency classification and enhance the representative ability of minority classes. We compared our method with the most popular time-series contrastive learning methods on four real-world time-series datasets and demonstrated our significant advantages in overall performance.
