Self-Supervised Learning of Time Series Representation via Diffusion Process and Imputation-Interpolation-Forecasting Mask
Zineb Senane, Lele Cao, Valentin Leonhard Buchner, Yusuke Tashiro, Lei You, Pawel Herman, Mats Nordahl, Ruibo Tu, Vilhelm von Ehrenheim
TL;DR
TSDE introduces a novel diffusion-based self-supervised TSRL framework that learns general-purpose multivariate time-series embeddings by conditioning a reverse diffusion process on a learnable MTS embedding. It combines dual-orthogonal Transformer encoders with a crossover mechanism and an Imputation-Interpolation-Forecasting (IIF) mask to support imputation, interpolation, forecasting, anomaly detection, classification, and clustering, while achieving significant inference efficiency gains. Extensive experiments on real-world datasets demonstrate state-of-the-art or competitive performance across tasks, with ablations confirming the value of the crossover architecture and IIF masking. The work provides a practical, scalable approach to learning versatile TS representations, enabling robust downstream analytics even in highly missing or noisy data regimes.
Abstract
Time Series Representation Learning (TSRL) focuses on generating informative representations for various Time Series (TS) modeling tasks. Traditional Self-Supervised Learning (SSL) methods in TSRL fall into four main categories: reconstructive, adversarial, contrastive, and predictive, each with a common challenge of sensitivity to noise and intricate data nuances. Recently, diffusion-based methods have shown advanced generative capabilities. However, they primarily target specific application scenarios like imputation and forecasting, leaving a gap in leveraging diffusion models for generic TSRL. Our work, Time Series Diffusion Embedding (TSDE), bridges this gap as the first diffusion-based SSL TSRL approach. TSDE segments TS data into observed and masked parts using an Imputation-Interpolation-Forecasting (IIF) mask. It applies a trainable embedding function, featuring dual-orthogonal Transformer encoders with a crossover mechanism, to the observed part. We train a reverse diffusion process conditioned on the embeddings, designed to predict noise added to the masked part. Extensive experiments demonstrate TSDE's superiority in imputation, interpolation, forecasting, anomaly detection, classification, and clustering. We also conduct an ablation study, present embedding visualizations, and compare inference speed, further substantiating TSDE's efficiency and validity in learning representations of TS data.
