Table of Contents
Fetching ...

Computer Vision Self-supervised Learning Methods on Time Series

Daesoo Lee, Erlend Aune

TL;DR

This study examines whether mainstream computer-vision SSL frameworks can transfer to time-series data, using the UCR and UEA benchmarks. It benchmarks contrastive and non-contrastive methods (e.g., SimCLR, BYOL, SimSiam, Barlow Twins, VICReg) and introduces VIbCReg, an improved VICReg variant with a normalized covariance matrix and IterNorm to accelerate learning. VIbCReg achieves state-of-the-art linear evaluation and competitive SVM performance on UCR/UEA, while using a lightweight 1D-ResNet encoder, demonstrating cross-domain applicability of CV SSL ideas to time series. The work highlights faster representation learning and stronger feature decorrelation, suggesting future work on longer sequences and more flexible encoders. Overall, VIbCReg offers a robust, transfer-friendly SSL approach for time-series representation learning with practical implications for industrial, financial, and IoT data analysis.

Abstract

Self-supervised learning (SSL) has had great success in both computer vision. Most of the current mainstream computer vision SSL frameworks are based on Siamese network architecture. These approaches often rely on cleverly crafted loss functions and training setups to avoid feature collapse. In this study, we evaluate if those computer-vision SSL frameworks are also effective on a different modality (\textit{i.e.,} time series). The effectiveness is experimented and evaluated on the UCR and UEA archives, and we show that the computer vision SSL frameworks can be effective even for time series. In addition, we propose a new method that improves on the recently proposed VICReg method. Our method improves on a \textit{covariance} term proposed in VICReg, and in addition we augment the head of the architecture by an iterative normalization layer that accelerates the convergence of the model.

Computer Vision Self-supervised Learning Methods on Time Series

TL;DR

This study examines whether mainstream computer-vision SSL frameworks can transfer to time-series data, using the UCR and UEA benchmarks. It benchmarks contrastive and non-contrastive methods (e.g., SimCLR, BYOL, SimSiam, Barlow Twins, VICReg) and introduces VIbCReg, an improved VICReg variant with a normalized covariance matrix and IterNorm to accelerate learning. VIbCReg achieves state-of-the-art linear evaluation and competitive SVM performance on UCR/UEA, while using a lightweight 1D-ResNet encoder, demonstrating cross-domain applicability of CV SSL ideas to time series. The work highlights faster representation learning and stronger feature decorrelation, suggesting future work on longer sequences and more flexible encoders. Overall, VIbCReg offers a robust, transfer-friendly SSL approach for time-series representation learning with practical implications for industrial, financial, and IoT data analysis.

Abstract

Self-supervised learning (SSL) has had great success in both computer vision. Most of the current mainstream computer vision SSL frameworks are based on Siamese network architecture. These approaches often rely on cleverly crafted loss functions and training setups to avoid feature collapse. In this study, we evaluate if those computer-vision SSL frameworks are also effective on a different modality (\textit{i.e.,} time series). The effectiveness is experimented and evaluated on the UCR and UEA archives, and we show that the computer vision SSL frameworks can be effective even for time series. In addition, we propose a new method that improves on the recently proposed VICReg method. Our method improves on a \textit{covariance} term proposed in VICReg, and in addition we augment the head of the architecture by an iterative normalization layer that accelerates the convergence of the model.

Paper Structure

This paper contains 51 sections, 14 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Comparison on Siamese architecture-based SSL frameworks. Siamese denotes the Siamese architecture Koch2015SiameseRecognition and the others are SSL frameworks. The encoder includes all layers that can be shared between both branches. The dash lines indicate the gradient propagation flow. Therefore, the lack of a dash line denotes stop-gradient.
  • Figure 2: Comparison between VICReg and VIbCReg, where the difference is highlighted in red. As for VIbCReg, two batches of different views $X$ and $X^\prime$ are taken from a batch of input data $\mathrm{D}$, and representations $Y$ and $Y^\prime$ are obtained through the encoder. Then, the representations are projected to a higher dimension via the projector and the iterative normalization (IterNorm) layer, yielding $Z$ and $Z^\prime$. Then, similarity between $Z$ and $Z^\prime$ and variance along the batch dimension are maximized, while feature components of $Z$ and $Z^\prime$ are decorrelated from each other.
  • Figure 3: T-SNE visualization of learned representations with VICReg and VIbCReg, respectively. For each set of subfigures, the left subfigure is by VICReg and the right subfigure is by VIbCReg. A subset dataset name is shown below each set of the subfigures and the different colors represent different classes.
  • Figure 4: 5-kNN classification accuracy on the UCR datasets during the representation learning.
  • Figure 5: Comparative linear evaluation of the frameworks between VICReg and VIbCReg.
  • ...and 4 more figures