Table of Contents
Fetching ...

Baguan-TS: A Sequence-Native In-Context Learning Model for Time Series Forecasting with Covariates

Linxiao Yang, Xue Jiang, Gezheng Xu, Tian Zhou, Min Yang, ZhaoYang Zhu, Linyuan Geng, Zhipeng Zeng, Qiming Chen, Xinyue Gu, Rong Jin, Liang Sun

Abstract

Transformers enable in-context learning (ICL) for rapid, gradient-free adaptation in time series forecasting, yet most ICL-style approaches rely on tabularized, hand-crafted features, while end-to-end sequence models lack inference-time adaptation. We bridge this gap with a unified framework, Baguan-TS, which integrates the raw-sequence representation learning with ICL, instantiated by a 3D Transformer that attends jointly over temporal, variable, and context axes. To make this high-capacity model practical, we tackle two key hurdles: (i) calibration and training stability, improved with a feature-agnostic, target-space retrieval-based local calibration; and (ii) output oversmoothing, mitigated via context-overfitting strategy. On public benchmark with covariates, Baguan-TS consistently outperforms established baselines, achieving the highest win rate and significant reductions in both point and probabilistic forecasting metrics. Further evaluations across diverse real-world energy datasets demonstrate its robustness, yielding substantial improvements.

Baguan-TS: A Sequence-Native In-Context Learning Model for Time Series Forecasting with Covariates

Abstract

Transformers enable in-context learning (ICL) for rapid, gradient-free adaptation in time series forecasting, yet most ICL-style approaches rely on tabularized, hand-crafted features, while end-to-end sequence models lack inference-time adaptation. We bridge this gap with a unified framework, Baguan-TS, which integrates the raw-sequence representation learning with ICL, instantiated by a 3D Transformer that attends jointly over temporal, variable, and context axes. To make this high-capacity model practical, we tackle two key hurdles: (i) calibration and training stability, improved with a feature-agnostic, target-space retrieval-based local calibration; and (ii) output oversmoothing, mitigated via context-overfitting strategy. On public benchmark with covariates, Baguan-TS consistently outperforms established baselines, achieving the highest win rate and significant reductions in both point and probabilistic forecasting metrics. Further evaluations across diverse real-world energy datasets demonstrate its robustness, yielding substantial improvements.
Paper Structure (34 sections, 2 equations, 20 figures, 23 tables)

This paper contains 34 sections, 2 equations, 20 figures, 23 tables.

Figures (20)

  • Figure 1: Three paradigms for time series forecasting: (a) End-to-end sequence models learn from raw histories but lack in-context adaptation at inference. (b) Tabular ICL approaches (e.g., TabPFN) perform ICL over feature-engineered representations. (c) Our unified approach (Baguan-TS) enables sequence-native ICL on raw multivariate inputs, attending over temporal, variables, and context for gradient-free adaptation.
  • Figure 3: Overall architecture of Baguan-TS. The input tensor $\mathcal{Y}\in\mathbb{R}^{(C+1)\times (T+H)\times (M+1)}$ is encoded into patch tokens, then iteratively processed by stacked 3D Transformer blocks performing variable, temporal, and context attention, and finally mapped by a prediction head to produce the forecasting outputs $\mathbf{y}^f\in\mathbb{R}^H$.
  • Figure 4: Context organization strategies and t-SNE visualization. (a) Three context organization methods: (i) uniform splits (green); (ii) covariate-based retrieval in X-space (purple, relies on feature engineering); (iii) target-based retrieval in Y-space (ours, yellow), which focuses on historical patterns and is feature-agnostic. Higher similarity scores indicate stronger contextual relevance. (b) t-SNE plots of prediction horizons on epf (top) and entsoe (bottom) for three RBfcst variants. The ground truth (red star) lies closest to the Y-space RBfcst cluster (shaded), indicating it best captures the true pattern.
  • Figure 5: Illustration of the context-overfitting strategy. (a) Original design, where the model forecasts query targets from retrieved context episodes. (b) Duplicate-context design: a short segment from one context slice is copied into a query, and the model is trained to retrieve the matching context and reconstruct its targets.
  • Figure 6: Effect of the context-overfitting strategy. Main: training loss curves for the baseline model (green) and our context-overfitting strategy (red). Insets: example forecasts compared with ground truth (blue). The baseline oversmooths outputs and misses high-frequency spikes (right); our strategy keeps a low loss while recovering spike patterns by matching contextual templates (left).
  • ...and 15 more figures