A theoretical framework for self-supervised contrastive learning for continuous dependent data
Alexander Marusov, Aleksandr Yugay, Alexey Zaytsev
TL;DR
This work addresses the gap in self-supervised learning (SSL) for continuous dependent data by relaxing the semantic-independence assumption and introducing a theory-backed framework that accounts for dependencies in time and space. It defines two ground-truth similarity paradigms—hard (MA-like) and soft (AR-like)—and derives a closed-form, dependency-aware estimated similarity matrix $\widehat{\mathbf{G}}$ via a log-regularized optimization, ensuring a meaningful loss despite correlations. The authors instantiate the framework in Dependent TS2Vec (DepTS2Vec), integrating the new loss into the TS2Vec architecture and validating on temporal benchmarks (UCR/UEA) and spatio-temporal climate tasks (drought prediction, temperature forecasting), with consistent performance gains over strong baselines such as TS2Vec and SoftCL. They also provide theoretical guarantees (necessary and sufficient conditions) for the estimated similarity matrix and discuss practical extensions (anomaly masking, concept drift) that broaden applicability to diverse dependent-data domains. Overall, the framework delivers principled, interpretable objectives and empirically strong representations for continuous dependent data, with potential impact across time-series analysis and climate-informed decision making.
Abstract
Self-supervised learning (SSL) has emerged as a powerful approach to learning representations, particularly in the field of computer vision. However, its application to dependent data, such as temporal and spatio-temporal domains, remains underexplored. Besides, traditional contrastive SSL methods often assume \emph{semantic independence between samples}, which does not hold for dependent data exhibiting complex correlations. We propose a novel theoretical framework for contrastive SSL tailored to \emph{continuous dependent data}, which allows the nearest samples to be semantically close to each other. In particular, we propose two possible \textit{ground truth similarity measures} between objects -- \emph{hard} and \emph{soft} closeness. Under it, we derive an analytical form for the \textit{estimated similarity matrix} that accommodates both types of closeness between samples, thereby introducing dependency-aware loss functions. We validate our approach, \emph{Dependent TS2Vec}, on temporal and spatio-temporal downstream problems. Given the dependency patterns presented in the data, our approach surpasses modern ones for dependent data, highlighting the effectiveness of our theoretically grounded loss functions for SSL in capturing spatio-temporal dependencies. Specifically, we outperform TS2Vec on the standard UEA and UCR benchmarks, with accuracy improvements of $4.17$\% and $2.08$\%, respectively. Furthermore, on the drought classification task, which involves complex spatio-temporal patterns, our method achieves a $7$\% higher ROC-AUC score.
