Table of Contents
Fetching ...

Cross-RAG: Zero-Shot Retrieval-Augmented Time Series Forecasting via Cross-Attention

Seunghan Lee, Jaehoon Lee, Jun Seo, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, SoonYoung Lee, Wonbin Ahn

Abstract

Recent advances in time series foundation models (TSFMs) demonstrate strong expressive capacity through large-scale pretraining across diverse time series domains. Zero-shot time series forecasting with TSFMs, however, exhibits limited generalization to unseen datasets, which retrieval-augmented forecasting addresses by leveraging an external knowledge base. Existing approaches rely on a fixed number of retrieved samples that may introduce irrelevant information. To this end, we propose Cross-RAG, a zero-shot retrieval-augmented forecasting framework that selectively attends to query-relevant retrieved samples. Cross-RAG models input-level relevance between the query and retrieved samples via query-retrieval cross-attention, while jointly incorporating information from the query and retrieved samples. Extensive experiments demonstrate that Cross-RAG consistently improves zero-shot forecasting performance across various TSFMs and RAG methods, and additional analyses confirm its effectiveness across diverse retrieval scenarios. Code is available at https://github.com/seunghan96/cross-rag/.

Cross-RAG: Zero-Shot Retrieval-Augmented Time Series Forecasting via Cross-Attention

Abstract

Recent advances in time series foundation models (TSFMs) demonstrate strong expressive capacity through large-scale pretraining across diverse time series domains. Zero-shot time series forecasting with TSFMs, however, exhibits limited generalization to unseen datasets, which retrieval-augmented forecasting addresses by leveraging an external knowledge base. Existing approaches rely on a fixed number of retrieved samples that may introduce irrelevant information. To this end, we propose Cross-RAG, a zero-shot retrieval-augmented forecasting framework that selectively attends to query-relevant retrieved samples. Cross-RAG models input-level relevance between the query and retrieved samples via query-retrieval cross-attention, while jointly incorporating information from the query and retrieved samples. Extensive experiments demonstrate that Cross-RAG consistently improves zero-shot forecasting performance across various TSFMs and RAG methods, and additional analyses confirm its effectiveness across diverse retrieval scenarios. Code is available at https://github.com/seunghan96/cross-rag/.
Paper Structure (25 sections, 6 equations, 22 figures, 11 tables)

This paper contains 25 sections, 6 equations, 22 figures, 11 tables.

Figures (22)

  • Figure 1: Cross-attention btw query & retrieved inputs. While previous works aggregate retrieved samples without explicitly modeling the relationship between the query and the retrieved inputs, Cross-RAG performs input-aware fusion by using cross-attention to weight retrieved samples based on the input similarity.
  • Figure 2: Experiments w/ and w/o cross-attention. (a) The figure shows that, with cross-attention, performance improves as the number of retrieved samples ($k$) increases. (b) The figure presents zero-shot forecasting results on toy datasets with $10$ retrieved samples ($k=10$) for both methods, with and without cross-attention ning2025ts, where our method accurately matches the ground truth.
  • Figure 3: Overall Framework of Cross-RAG. Cross-RAG fuses retrieved information through two branches: (1) Query--retrieval cross-attention models relevance between the query and retrieved inputs and aggregates retrieved outputs conditioned on this relevance. (2) Retrieval self-attention summarizes retrieved outputs in a query-independent manner to capture contextual information among retrieved samples. The TSFM backbone and predictor are frozen, and only the additional modules are trained on general pretraining datasets.
  • Figure 4: Necessity of selective attention. The figure shows that the optimal $k$ varies both across and within datasets ([A] ETTh1, [B] Exchange, [C] Weather), highlighting the necessity of selectively attending to samples among the retrieved samples.
  • Figure 5: Results of zero-shot forecasting across TSFMs. The proposed method outperforms existing TSFMs across diverse datasets, achieving up to a 4.8% MSE improvement over the best baseline. The 1$^{\mathrm{st}}$ and 2$^{\mathrm{nd}}$ results are indicated by bold and underline, respectively.
  • ...and 17 more figures