TimeRAG: BOOSTING LLM Time Series Forecasting via Retrieval-Augmented Generation
Silin Yang, Dong Wang, Haoqi Zheng, Ruochun Jin
TL;DR
The paper addresses the limitations of applying large language models to time-series forecasting—namely domain adaptation costs and hallucinations—by introducing TimeRAG. TimeRAG builds a Time-Series Knowledge Base from training data via sliding-window segmentation and K-means clustering, retrieves similar reference sequences with Dynamic Time Warping, and feeds a NL prompt combining the query and references to a frozen LLM. On the M4 dataset, TimeRAG achieves an average improvement of 2.97% in forecasting accuracy and demonstrates robust performance across frequencies without modifying the underlying LLM parameters. This approach showcases how retrieval-augmented generation can enhance cross-domain sequential forecasting with interpretable references and efficient knowledge integration.
Abstract
Although the rise of large language models (LLMs) has introduced new opportunities for time series forecasting, existing LLM-based solutions require excessive training and exhibit limited transferability. In view of these challenges, we propose TimeRAG, a framework that incorporates Retrieval-Augmented Generation (RAG) into time series forecasting LLMs, which constructs a time series knowledge base from historical sequences, retrieves reference sequences from the knowledge base that exhibit similar patterns to the query sequence measured by Dynamic Time Warping (DTW), and combines these reference sequences and the prediction query as a textual prompt to the time series forecasting LLM. Experiments on datasets from various domains show that the integration of RAG improved the prediction accuracy of the original model by 2.97% on average.
