Time-Aware Prior Fitted Networks for Zero-Shot Forecasting with Exogenous Variables

Andres Potapczynski; Ravi Kiran Selvam; Tatiana Konstantinova; Shankar Ramasubramanian; Malcolm Wolff; Kin G. Olivares; Ruijun Ma; Mengfei Cao; Michael W. Mahoney; Andrew Gordon Wilson; Boris N. Oreshkin; Dmitry Efimov

Time-Aware Prior Fitted Networks for Zero-Shot Forecasting with Exogenous Variables

Andres Potapczynski, Ravi Kiran Selvam, Tatiana Konstantinova, Shankar Ramasubramanian, Malcolm Wolff, Kin G. Olivares, Ruijun Ma, Mengfei Cao, Michael W. Mahoney, Andrew Gordon Wilson, Boris N. Oreshkin, Dmitry Efimov

Abstract

In many time series forecasting settings, the target time series is accompanied by exogenous covariates, such as promotions and prices in retail demand; temperature in energy load; calendar and holiday indicators for traffic or sales; and grid load or fuel costs in electricity pricing. Ignoring these exogenous signals can substantially degrade forecasting accuracy, particularly when they drive spikes, discontinuities, or regime and phase changes in the target series. Most current time series foundation models (e.g., Chronos, Sundial, TimesFM, TimeMoE, TimeLLM, and LagLlama) ignore exogenous covariates and make forecasts solely from the numerical time series history, thereby limiting their performance. In this paper, we develop ApolloPFN, a prior-data fitted network (PFN) that is time-aware (unlike prior PFNs) and that natively incorporates exogenous covariates (unlike prior univariate forecasters). Our design introduces two major advances: (i) a synthetic data generation procedure tailored to resolve the failure modes that arise when tabular (non-temporal) PFNs are applied to time series; and (ii) time-aware architectural modifications that embed inductive biases needed to exploit the time series context. We demonstrate that ApolloPFN achieves state-of-the-art results across benchmarks, such as M5 and electric price forecasting, that contain exogenous information.

Time-Aware Prior Fitted Networks for Zero-Shot Forecasting with Exogenous Variables

Abstract

Paper Structure (21 sections, 5 equations, 6 figures, 4 tables, 2 algorithms)

This paper contains 21 sections, 5 equations, 6 figures, 4 tables, 2 algorithms.

Introduction
Background
Bayesian inference in prior-data fitted networks
Sample-feature separable Transformer
Failure Modes of TabPFN-TS
ApolloPFN
Synthetic Data Generation
Architectural modifications
Positional Encodings.
Full Attention.
Empirical evaluation
Training Setup
Zero-shot performance with exogenous variables
Zero-shot performance on classical univariate benchmarks
Ablation Studies
...and 6 more sections

Figures (6)

Figure 1: (a) Not using exogenous information can lead to catastrophic forecasting errors. We compare the predictions of ApolloPFN with and without exogenous information for the weekly sales of a real product from the M5 benchmark. Ignoring the price information leads the forecaster to predict a decreased demand via context parroting (brown), whereas the tracking of price dynamics helps the model focus on most up-to-date dynamics (pink). (b) Prior-data fitted networks such as TabPFN-TS fail to capture ordered patterns. We compare the prediction of TabPFN-TS and ApolloPFN for a synthetic time series that has a recurrent pattern of a ramp-up period before a promotion, a spike on the promotion, a ramp-down period, and then a subsequent decrease in demand. The exogenous promotion information is encoded as a binary indicator. Training data is to the left of the black line, and forecasts are to the right.
Figure 2: Failure modes of TabPFN-TS for time series data that ApolloPFN addresses. We provide illustrative examples of each failure case with different real time series: we use a time series in Tourism Monthly for (a), in Tourism Yearly for (b), in M5 Weekly for (c), and in M1 Monthly for (d). In the plots, the training data is to the left of the black line, and the forecasts are to the right. (a) When TabPFN-TS is not given frequency features, it predicts an average of prior history (green line). In contrast, TabPFN-TS might capture some time patterns when frequency features are available but miss others outside the frequency range (e.g., it does not capture the largest spikes). (b)TabPFN-TS has problems extrapolating trends especially in short context cases. (c) The predictions of TabPFN-TS erroneously revert back to zero, as that is the most common value in the context. (d) The range of the 90% confidence intervals in TabPFN-TS substantially increases to capture previously seen values rather than to reflect the uncertainty over the trend of the time series.
Figure 3: SRNGN graph generation algorithm used by ApolloPFN accelerates learning. We compare the test benchmark performance of our ApolloPFN model trained with the random growing network (RGN) algorithm and our Single Node Growing Network (SRNGN) algorithm at different training steps. With SRNGN, we achieve better performance at 20K iterations than at 80K with RGN.
Figure 4: Interventions in ApolloPFN to improve performance on time series data. Ablation on the use of RoPE and full attention. We compare the effect of progressively adding RoPE and full attention in several benchmarks against the baseline of TabPFN-TS.
Figure 5: How TabPFN combines attention across features and samples. Taken from hollmann2025tabpfn, the figure illustrates the main components of the TabPFN architecture, discussed in Equation \ref{['eq:block']}, plus the translation of the embedding into a Riemann distribution approximation of the PPD $p(y_{\text{test}}| \bm{x}_{\text{test}}, \mathcal{D}_{\text{train}})$.
...and 1 more figures

Time-Aware Prior Fitted Networks for Zero-Shot Forecasting with Exogenous Variables

Abstract

Time-Aware Prior Fitted Networks for Zero-Shot Forecasting with Exogenous Variables

Authors

Abstract

Table of Contents

Figures (6)