Table of Contents
Fetching ...

A theoretical framework for self-supervised contrastive learning for continuous dependent data

Alexander Marusov, Aleksandr Yugay, Alexey Zaytsev

TL;DR

This work addresses the gap in self-supervised learning (SSL) for continuous dependent data by relaxing the semantic-independence assumption and introducing a theory-backed framework that accounts for dependencies in time and space. It defines two ground-truth similarity paradigms—hard (MA-like) and soft (AR-like)—and derives a closed-form, dependency-aware estimated similarity matrix $\widehat{\mathbf{G}}$ via a log-regularized optimization, ensuring a meaningful loss despite correlations. The authors instantiate the framework in Dependent TS2Vec (DepTS2Vec), integrating the new loss into the TS2Vec architecture and validating on temporal benchmarks (UCR/UEA) and spatio-temporal climate tasks (drought prediction, temperature forecasting), with consistent performance gains over strong baselines such as TS2Vec and SoftCL. They also provide theoretical guarantees (necessary and sufficient conditions) for the estimated similarity matrix and discuss practical extensions (anomaly masking, concept drift) that broaden applicability to diverse dependent-data domains. Overall, the framework delivers principled, interpretable objectives and empirically strong representations for continuous dependent data, with potential impact across time-series analysis and climate-informed decision making.

Abstract

Self-supervised learning (SSL) has emerged as a powerful approach to learning representations, particularly in the field of computer vision. However, its application to dependent data, such as temporal and spatio-temporal domains, remains underexplored. Besides, traditional contrastive SSL methods often assume \emph{semantic independence between samples}, which does not hold for dependent data exhibiting complex correlations. We propose a novel theoretical framework for contrastive SSL tailored to \emph{continuous dependent data}, which allows the nearest samples to be semantically close to each other. In particular, we propose two possible \textit{ground truth similarity measures} between objects -- \emph{hard} and \emph{soft} closeness. Under it, we derive an analytical form for the \textit{estimated similarity matrix} that accommodates both types of closeness between samples, thereby introducing dependency-aware loss functions. We validate our approach, \emph{Dependent TS2Vec}, on temporal and spatio-temporal downstream problems. Given the dependency patterns presented in the data, our approach surpasses modern ones for dependent data, highlighting the effectiveness of our theoretically grounded loss functions for SSL in capturing spatio-temporal dependencies. Specifically, we outperform TS2Vec on the standard UEA and UCR benchmarks, with accuracy improvements of $4.17$\% and $2.08$\%, respectively. Furthermore, on the drought classification task, which involves complex spatio-temporal patterns, our method achieves a $7$\% higher ROC-AUC score.

A theoretical framework for self-supervised contrastive learning for continuous dependent data

TL;DR

This work addresses the gap in self-supervised learning (SSL) for continuous dependent data by relaxing the semantic-independence assumption and introducing a theory-backed framework that accounts for dependencies in time and space. It defines two ground-truth similarity paradigms—hard (MA-like) and soft (AR-like)—and derives a closed-form, dependency-aware estimated similarity matrix via a log-regularized optimization, ensuring a meaningful loss despite correlations. The authors instantiate the framework in Dependent TS2Vec (DepTS2Vec), integrating the new loss into the TS2Vec architecture and validating on temporal benchmarks (UCR/UEA) and spatio-temporal climate tasks (drought prediction, temperature forecasting), with consistent performance gains over strong baselines such as TS2Vec and SoftCL. They also provide theoretical guarantees (necessary and sufficient conditions) for the estimated similarity matrix and discuss practical extensions (anomaly masking, concept drift) that broaden applicability to diverse dependent-data domains. Overall, the framework delivers principled, interpretable objectives and empirically strong representations for continuous dependent data, with potential impact across time-series analysis and climate-informed decision making.

Abstract

Self-supervised learning (SSL) has emerged as a powerful approach to learning representations, particularly in the field of computer vision. However, its application to dependent data, such as temporal and spatio-temporal domains, remains underexplored. Besides, traditional contrastive SSL methods often assume \emph{semantic independence between samples}, which does not hold for dependent data exhibiting complex correlations. We propose a novel theoretical framework for contrastive SSL tailored to \emph{continuous dependent data}, which allows the nearest samples to be semantically close to each other. In particular, we propose two possible \textit{ground truth similarity measures} between objects -- \emph{hard} and \emph{soft} closeness. Under it, we derive an analytical form for the \textit{estimated similarity matrix} that accommodates both types of closeness between samples, thereby introducing dependency-aware loss functions. We validate our approach, \emph{Dependent TS2Vec}, on temporal and spatio-temporal downstream problems. Given the dependency patterns presented in the data, our approach surpasses modern ones for dependent data, highlighting the effectiveness of our theoretically grounded loss functions for SSL in capturing spatio-temporal dependencies. Specifically, we outperform TS2Vec on the standard UEA and UCR benchmarks, with accuracy improvements of \% and \%, respectively. Furthermore, on the drought classification task, which involves complex spatio-temporal patterns, our method achieves a \% higher ROC-AUC score.

Paper Structure

This paper contains 47 sections, 2 theorems, 18 equations, 2 figures, 5 tables.

Key Result

Theorem 1

Solving the optimization problem opt_prob using $\mathcal{R_{\log}}$ regularization gives the following estimation of the similarity matrix: where $d_f(\mathbf{x}, \mathbf{x}') = d(f_{\theta}(\mathbf{x}), f_{\theta}(\mathbf{x}'))$.

Figures (2)

  • Figure 1: The autocorrelation function (ACF, top row) visualizes the dependency between objects in time series processing, and the correlation matrices (bottom row) are also a standard way to plot similarities between elements. For the ACF, higher values correspond to bigger correlations; for the matrices, the darker colors correspond to higher correlations. We provide plots for three different possible semantic connections between samples: semantic independence and two of our hard and soft dependencies. In the first case, all samples are not semantically related to each other. For hard dependency, similarly to an MA(1) process, the adjacent samples are semantically connected, while others are not. For soft dependency, similarly to an AR process, the closeness between elements decreases exponentially with the distance between them.
  • Figure 2: Comparison of the methods via RMSE

Theorems & Definitions (5)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • proof