Baguan-TS: A Sequence-Native In-Context Learning Model for Time Series Forecasting with Covariates

Linxiao Yang; Xue Jiang; Gezheng Xu; Tian Zhou; Min Yang; ZhaoYang Zhu; Linyuan Geng; Zhipeng Zeng; Qiming Chen; Xinyue Gu; Rong Jin; Liang Sun

Baguan-TS: A Sequence-Native In-Context Learning Model for Time Series Forecasting with Covariates

Linxiao Yang, Xue Jiang, Gezheng Xu, Tian Zhou, Min Yang, ZhaoYang Zhu, Linyuan Geng, Zhipeng Zeng, Qiming Chen, Xinyue Gu, Rong Jin, Liang Sun

Abstract

Transformers enable in-context learning (ICL) for rapid, gradient-free adaptation in time series forecasting, yet most ICL-style approaches rely on tabularized, hand-crafted features, while end-to-end sequence models lack inference-time adaptation. We bridge this gap with a unified framework, Baguan-TS, which integrates the raw-sequence representation learning with ICL, instantiated by a 3D Transformer that attends jointly over temporal, variable, and context axes. To make this high-capacity model practical, we tackle two key hurdles: (i) calibration and training stability, improved with a feature-agnostic, target-space retrieval-based local calibration; and (ii) output oversmoothing, mitigated via context-overfitting strategy. On public benchmark with covariates, Baguan-TS consistently outperforms established baselines, achieving the highest win rate and significant reductions in both point and probabilistic forecasting metrics. Further evaluations across diverse real-world energy datasets demonstrate its robustness, yielding substantial improvements.

Baguan-TS: A Sequence-Native In-Context Learning Model for Time Series Forecasting with Covariates

Abstract

Paper Structure (34 sections, 2 equations, 20 figures, 23 tables)

This paper contains 34 sections, 2 equations, 20 figures, 23 tables.

Introduction
Related Work
Baguan-TS
Problem Formulation
Architecture
Patching and Tokenization
3D Transformer Block
Prediction Head
Training
Retrieval-Based Forecasting
Context-Overfitting Strategy
Adaptive Inference
Experiments
Experiment Settings
Zero-Shot Forecasting
...and 19 more sections

Figures (20)

Figure 1: Three paradigms for time series forecasting: (a) End-to-end sequence models learn from raw histories but lack in-context adaptation at inference. (b) Tabular ICL approaches (e.g., TabPFN) perform ICL over feature-engineered representations. (c) Our unified approach (Baguan-TS) enables sequence-native ICL on raw multivariate inputs, attending over temporal, variables, and context for gradient-free adaptation.
Figure 3: Overall architecture of Baguan-TS. The input tensor $\mathcal{Y}\in\mathbb{R}^{(C+1)\times (T+H)\times (M+1)}$ is encoded into patch tokens, then iteratively processed by stacked 3D Transformer blocks performing variable, temporal, and context attention, and finally mapped by a prediction head to produce the forecasting outputs $\mathbf{y}^f\in\mathbb{R}^H$.
Figure 4: Context organization strategies and t-SNE visualization. (a) Three context organization methods: (i) uniform splits (green); (ii) covariate-based retrieval in X-space (purple, relies on feature engineering); (iii) target-based retrieval in Y-space (ours, yellow), which focuses on historical patterns and is feature-agnostic. Higher similarity scores indicate stronger contextual relevance. (b) t-SNE plots of prediction horizons on epf (top) and entsoe (bottom) for three RBfcst variants. The ground truth (red star) lies closest to the Y-space RBfcst cluster (shaded), indicating it best captures the true pattern.
Figure 5: Illustration of the context-overfitting strategy. (a) Original design, where the model forecasts query targets from retrieved context episodes. (b) Duplicate-context design: a short segment from one context slice is copied into a query, and the model is trained to retrieve the matching context and reconstruct its targets.
Figure 6: Effect of the context-overfitting strategy. Main: training loss curves for the baseline model (green) and our context-overfitting strategy (red). Insets: example forecasts compared with ground truth (blue). The baseline oversmooths outputs and misses high-frequency spikes (right); our strategy keeps a low loss while recovering spike patterns by matching contextual templates (left).
...and 15 more figures

Baguan-TS: A Sequence-Native In-Context Learning Model for Time Series Forecasting with Covariates

Abstract

Baguan-TS: A Sequence-Native In-Context Learning Model for Time Series Forecasting with Covariates

Authors

Abstract

Table of Contents

Figures (20)