Table of Contents
Fetching ...

LaT-PFN: A Joint Embedding Predictive Architecture for In-context Time-series Forecasting

Stijn Verdenius, Andrea Zerio, Roy L. M. Wang

TL;DR

LaT-PFN targets zero-shot time series forecasting by learning in-context representations in latent space through a fusion of Prior-data Fitted Networks and Joint Embedding Predictive Architecture, aided by a normalized abstract time-axis and a synthetic context prior. The architecture separately optimizes latent prediction and decoding, and introduces a system-identification regularizer to stabilize embeddings, enabling robust generalization across unseen distributions. It demonstrates superior zero-shot forecasting compared to baselines, produces informative embeddings, and reveals emergent patch-like latent tokens reminiscent of Vision Transformers, suggesting a rudimentary time-series vocabulary. This approach offers a data-efficient foundation model for time series, with broad transfer potential and practical impact across domains requiring rapid, deployable forecasting without retraining on new datasets.

Abstract

We introduce LatentTimePFN (LaT-PFN), a foundational Time Series model with a strong embedding space that enables zero-shot forecasting. To achieve this, we perform in-context learning in latent space utilizing a novel integration of the Prior-data Fitted Networks (PFN) and Joint Embedding Predictive Architecture (JEPA) frameworks. We leverage the JEPA framework to create a prediction-optimized latent representation of the underlying stochastic process that generates time series and combines it with contextual learning, using a PFN. Furthermore, we improve on preceding works by utilizing related time series as a context and introducing a normalized abstract time axis. This reduces training time and increases the versatility of the model by allowing any time granularity and forecast horizon. We show that this results in superior zero-shot predictions compared to established baselines. We also demonstrate our latent space produces informative embeddings of both individual time steps and fixed-length summaries of entire series. Finally, we observe the emergence of multi-step patch embeddings without explicit training, suggesting the model actively learns discrete tokens that encode local structures in the data, analogous to vision transformers.

LaT-PFN: A Joint Embedding Predictive Architecture for In-context Time-series Forecasting

TL;DR

LaT-PFN targets zero-shot time series forecasting by learning in-context representations in latent space through a fusion of Prior-data Fitted Networks and Joint Embedding Predictive Architecture, aided by a normalized abstract time-axis and a synthetic context prior. The architecture separately optimizes latent prediction and decoding, and introduces a system-identification regularizer to stabilize embeddings, enabling robust generalization across unseen distributions. It demonstrates superior zero-shot forecasting compared to baselines, produces informative embeddings, and reveals emergent patch-like latent tokens reminiscent of Vision Transformers, suggesting a rudimentary time-series vocabulary. This approach offers a data-efficient foundation model for time series, with broad transfer potential and practical impact across domains requiring rapid, deployable forecasting without retraining on new datasets.

Abstract

We introduce LatentTimePFN (LaT-PFN), a foundational Time Series model with a strong embedding space that enables zero-shot forecasting. To achieve this, we perform in-context learning in latent space utilizing a novel integration of the Prior-data Fitted Networks (PFN) and Joint Embedding Predictive Architecture (JEPA) frameworks. We leverage the JEPA framework to create a prediction-optimized latent representation of the underlying stochastic process that generates time series and combines it with contextual learning, using a PFN. Furthermore, we improve on preceding works by utilizing related time series as a context and introducing a normalized abstract time axis. This reduces training time and increases the versatility of the model by allowing any time granularity and forecast horizon. We show that this results in superior zero-shot predictions compared to established baselines. We also demonstrate our latent space produces informative embeddings of both individual time steps and fixed-length summaries of entire series. Finally, we observe the emergence of multi-step patch embeddings without explicit training, suggesting the model actively learns discrete tokens that encode local structures in the data, analogous to vision transformers.
Paper Structure (43 sections, 18 equations, 13 figures, 6 tables)

This paper contains 43 sections, 18 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: PFN attention muller2021transformers
  • Figure 2: JEPA lecun2022path
  • Figure 3: Our time series forecasting PFN problem statement, compared to preceding works dooley2023forecastpfnhollmann2022tabpfn
  • Figure 4: The LaT-PFN Architecture. The context is embedded in fixed-length series-vectors. These are fed into the PFN Predictor transformer, with the embedded held-out history prompts, using cross-attention. The latent predictions are compared to the latent target, then decoded with a stop-gradient, and compared to real targets. Finally, we apply a supervised regularization on simulation parameters.
  • Figure 5: Production of embeddings and prompts by masking and selection
  • ...and 8 more figures