Table of Contents
Fetching ...

Label-efficient Time Series Representation Learning: A Review

Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen, Min Wu, Chee-Keong Kwoh, Xiaoli Li

TL;DR

This survey tackles the challenge of learning effective time series representations with limited labeled data. It introduces a taxonomy that separates in-domain from cross-domain strategies, and surveys data augmentation, self-supervised learning, semi-supervised learning, and domain adaptation within these categories. The work highlights concrete methods, their losses, and representative techniques (e.g., InfoNCE, MMD, adversarial domain adaptation), analyzes their advantages and limitations, and advocates hybrid approaches and standardized benchmarks. The findings underscore the practical importance of label-efficient methods for domains like healthcare and industry, where labeled data are scarce but unlabeled data abound, and they provide a roadmap for future research in hybrid strategies and domain-transfer-aware design.

Abstract

Label-efficient time series representation learning, which aims to learn effective representations with limited labeled data, is crucial for deploying deep learning models in real-world applications. To address the scarcity of labeled time series data, various strategies, e.g., transfer learning, self-supervised learning, and semi-supervised learning, have been developed. In this survey, we introduce a novel taxonomy for the first time, categorizing existing approaches as in-domain or cross-domain, based on their reliance on external data sources or not. Furthermore, we present a review of the recent advances in each strategy, conclude the limitations of current methodologies, and suggest future research directions that promise further improvements in the field.

Label-efficient Time Series Representation Learning: A Review

TL;DR

This survey tackles the challenge of learning effective time series representations with limited labeled data. It introduces a taxonomy that separates in-domain from cross-domain strategies, and surveys data augmentation, self-supervised learning, semi-supervised learning, and domain adaptation within these categories. The work highlights concrete methods, their losses, and representative techniques (e.g., InfoNCE, MMD, adversarial domain adaptation), analyzes their advantages and limitations, and advocates hybrid approaches and standardized benchmarks. The findings underscore the practical importance of label-efficient methods for domains like healthcare and industry, where labeled data are scarce but unlabeled data abound, and they provide a roadmap for future research in hybrid strategies and domain-transfer-aware design.

Abstract

Label-efficient time series representation learning, which aims to learn effective representations with limited labeled data, is crucial for deploying deep learning models in real-world applications. To address the scarcity of labeled time series data, various strategies, e.g., transfer learning, self-supervised learning, and semi-supervised learning, have been developed. In this survey, we introduce a novel taxonomy for the first time, categorizing existing approaches as in-domain or cross-domain, based on their reliance on external data sources or not. Furthermore, we present a review of the recent advances in each strategy, conclude the limitations of current methodologies, and suggest future research directions that promise further improvements in the field.
Paper Structure (38 sections, 3 equations, 7 figures, 2 tables)

This paper contains 38 sections, 3 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Graphical overview of the different scenarios of data availability and the suitable solution. Note that the directions given by arrows are not the only solutions for each respective scenario, but can be the best way to achieve the best performance.
  • Figure 2: Data augmentation is utilized to increase the number of training samples. For each input sample, several augmentations are applied to generate new samples.
  • Figure 3: The pipeline of the self-supervised learning process. In the pretraining phase, the encoder is trained with the self-supervised loss without labels. Next, in the fine-tuning phase, the pretrained encoder is fine-tuned with the available few labeled data.
  • Figure 4: Illustration of the semi-supervised learning process. We train the model with the labeled portion of the data via the cross-entropy loss $\mathcal{L}_{\text{cls}}$, and the unlabeled portion via an unsupervised learning technique $\mathcal{L}_{\text{unsup}}$, i.e., $\mathcal{L}_{\text{SemiSL}} = \mathcal{L}_{\text{cls}} + \mathcal{L}_{\text{unsup}}$.
  • Figure 5: The three phases of the transfer learning process. In the first phase, the model is trained on the labeled source domain data. In the second phase, the pretrained model is fine-tuned on the available few target domain samples. Last, the model is tested on the unlabeled target domain data during the inference phase.
  • ...and 2 more figures