Table of Contents
Fetching ...

Joint Embeddings Go Temporal

Sofiane Ennadir, Siavash Golkar, Leopoldo Sarra

TL;DR

This work addresses noise and confounding factors in time-series self-supervised learning by introducing TS-JEPA, a latent-space predictive architecture based on JEPA. TS-JEPA uses a Tokenizer, an Encoder, a Predictor, and an EMA-Encoder to predict masked latent representations, optimized via a latent-space loss $${\mathcal{L}} = \frac{1}{|\mathcal{M}|} \sum_{i \in \mathcal{M}} || z'_i - t_i ||_1$$ with $z'_{\mathcal{M}} = P_{\beta}(E_{\theta}(\mathcal{P}_{\mathcal{N}}))_{\mathcal{M}}$. Empirically, TS-JEPA achieves strong classification performance, matches or surpasses MAE, and maintains competitive forecasting while offering improved stability and sample efficiency, highlighting its potential as a robust foundation model for time-series data. The results suggest that latent-space JEPA can provide balanced, transferable representations across classification and forecasting tasks, motivating future scaling and deployment of time-series foundation models.

Abstract

Self-supervised learning has seen great success recently in unsupervised representation learning, enabling breakthroughs in natural language and image processing. However, these methods often rely on autoregressive and masked modeling, which aim to reproduce masked information in the input, which can be vulnerable to the presence of noise or confounding variables. To address this problem, Joint-Embedding Predictive Architectures (JEPA) has been introduced with the aim to perform self-supervised learning in the latent space. To leverage these advancements in the domain of time series, we introduce Time Series JEPA (TS-JEPA), an architecture specifically adapted for time series representation learning. We validate TS-JEPA on both classification and forecasting, showing that it can match or surpass current state-of-the-art baselines on different standard datasets. Notably, our approach demonstrates a strong performance balance across diverse tasks, indicating its potential as a robust foundation for learning general representations. Thus, this work lays the groundwork for developing future time series foundation models based on Joint Embedding.

Joint Embeddings Go Temporal

TL;DR

This work addresses noise and confounding factors in time-series self-supervised learning by introducing TS-JEPA, a latent-space predictive architecture based on JEPA. TS-JEPA uses a Tokenizer, an Encoder, a Predictor, and an EMA-Encoder to predict masked latent representations, optimized via a latent-space loss with . Empirically, TS-JEPA achieves strong classification performance, matches or surpasses MAE, and maintains competitive forecasting while offering improved stability and sample efficiency, highlighting its potential as a robust foundation model for time-series data. The results suggest that latent-space JEPA can provide balanced, transferable representations across classification and forecasting tasks, motivating future scaling and deployment of time-series foundation models.

Abstract

Self-supervised learning has seen great success recently in unsupervised representation learning, enabling breakthroughs in natural language and image processing. However, these methods often rely on autoregressive and masked modeling, which aim to reproduce masked information in the input, which can be vulnerable to the presence of noise or confounding variables. To address this problem, Joint-Embedding Predictive Architectures (JEPA) has been introduced with the aim to perform self-supervised learning in the latent space. To leverage these advancements in the domain of time series, we introduce Time Series JEPA (TS-JEPA), an architecture specifically adapted for time series representation learning. We validate TS-JEPA on both classification and forecasting, showing that it can match or surpass current state-of-the-art baselines on different standard datasets. Notably, our approach demonstrates a strong performance balance across diverse tasks, indicating its potential as a robust foundation for learning general representations. Thus, this work lays the groundwork for developing future time series foundation models based on Joint Embedding.

Paper Structure

This paper contains 7 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Illustration of TS-JEPA: it consists of ($1$) a tokenizer, ($2$) an encoder that processes the non-masked patches, ($3$) a Predictor that generates the target predictions from the encoder's output and ($4$) the EMA-Encoder, which encodes the target masked patches.
  • Figure 2: Performance of TS-JEPA against a full-supervised Transformer when subject to less training labels on the FordA dataset (a) and FaultDetectionA dataset (b).
  • Figure 2: MSE and MAE of short-term forecasting.
  • Figure 3: Comparison of the Cumulative Mean Square Error on the long-term forecasting task.
  • Figure 4: Effect of Learning Rate on the long-term forecasting for autoregressive models.
  • ...and 2 more figures