Large Language Models for Time Series: A Survey
Xiyuan Zhang, Ranak Roy Chowdhury, Rajesh K. Gupta, Jingbo Shang
TL;DR
The paper addresses how to harness large language models for time series analysis by bridging modality gaps between text-trained LLMs and numerical time series data. It proposes a five-category taxonomy—prompting, quantization, aligning, vision as bridge, and tool integration—together with a formal input/output framework expressed as $x=(x_s,x_t)$ with $x_s ∈ R^{T×c}$ and outputs $y$ that can be time series, text, or numeric, and by outlining representative methods for each category. The survey compiles representative works, introduces multimodal datasets, and discusses challenges and future directions across theory, multimodal multitask learning, efficiency, domain knowledge integration, customization, and privacy. By organizing existing work into this foundation, the paper enables scalable, multimodal time series analysis across domains such as climate, IoT, healthcare, traffic, and finance, and highlights open problems in building robust, scalable time-series foundation models.
Abstract
Large Language Models (LLMs) have seen significant use in domains such as natural language processing and computer vision. Going beyond text, image and graphics, LLMs present a significant potential for analysis of time series data, benefiting domains such as climate, IoT, healthcare, traffic, audio and finance. This survey paper provides an in-depth exploration and a detailed taxonomy of the various methodologies employed to harness the power of LLMs for time series analysis. We address the inherent challenge of bridging the gap between LLMs' original text data training and the numerical nature of time series data, and explore strategies for transferring and distilling knowledge from LLMs to numerical time series analysis. We detail various methodologies, including (1) direct prompting of LLMs, (2) time series quantization, (3) aligning techniques, (4) utilization of the vision modality as a bridging mechanism, and (5) the combination of LLMs with tools. Additionally, this survey offers a comprehensive overview of the existing multimodal time series and text datasets and delves into the challenges and future opportunities of this emerging field. We maintain an up-to-date Github repository which includes all the papers and datasets discussed in the survey.
