Large Language Models for Time Series: A Survey

Xiyuan Zhang; Ranak Roy Chowdhury; Rajesh K. Gupta; Jingbo Shang

Large Language Models for Time Series: A Survey

Xiyuan Zhang, Ranak Roy Chowdhury, Rajesh K. Gupta, Jingbo Shang

TL;DR

The paper addresses how to harness large language models for time series analysis by bridging modality gaps between text-trained LLMs and numerical time series data. It proposes a five-category taxonomy—prompting, quantization, aligning, vision as bridge, and tool integration—together with a formal input/output framework expressed as $x=(x_s,x_t)$ with $x_s ∈ R^{T×c}$ and outputs $y$ that can be time series, text, or numeric, and by outlining representative methods for each category. The survey compiles representative works, introduces multimodal datasets, and discusses challenges and future directions across theory, multimodal multitask learning, efficiency, domain knowledge integration, customization, and privacy. By organizing existing work into this foundation, the paper enables scalable, multimodal time series analysis across domains such as climate, IoT, healthcare, traffic, and finance, and highlights open problems in building robust, scalable time-series foundation models.

Abstract

Large Language Models (LLMs) have seen significant use in domains such as natural language processing and computer vision. Going beyond text, image and graphics, LLMs present a significant potential for analysis of time series data, benefiting domains such as climate, IoT, healthcare, traffic, audio and finance. This survey paper provides an in-depth exploration and a detailed taxonomy of the various methodologies employed to harness the power of LLMs for time series analysis. We address the inherent challenge of bridging the gap between LLMs' original text data training and the numerical nature of time series data, and explore strategies for transferring and distilling knowledge from LLMs to numerical time series analysis. We detail various methodologies, including (1) direct prompting of LLMs, (2) time series quantization, (3) aligning techniques, (4) utilization of the vision modality as a bridging mechanism, and (5) the combination of LLMs with tools. Additionally, this survey offers a comprehensive overview of the existing multimodal time series and text datasets and delves into the challenges and future opportunities of this emerging field. We maintain an up-to-date Github repository which includes all the papers and datasets discussed in the survey.

Large Language Models for Time Series: A Survey

TL;DR

with

and outputs

that can be time series, text, or numeric, and by outlining representative methods for each category. The survey compiles representative works, introduces multimodal datasets, and discusses challenges and future directions across theory, multimodal multitask learning, efficiency, domain knowledge integration, customization, and privacy. By organizing existing work into this foundation, the paper enables scalable, multimodal time series analysis across domains such as climate, IoT, healthcare, traffic, and finance, and highlights open problems in building robust, scalable time-series foundation models.

Abstract

Paper Structure (18 sections, 3 equations, 4 figures, 3 tables)

This paper contains 18 sections, 3 equations, 4 figures, 3 tables.

Introduction
Background and Problem Formulation
Taxonomy
Prompting
Quantization
Aligning
Vision as Bridge
Tool
Comparison within the Taxonomy
Multimodal Datasets
Challenges and Future Directions
Theoretical Understanding
Multimodal and Multitask Analysis
Efficient Algorithms
Combining Domain Knowledge
...and 3 more sections

Figures (4)

Figure 1: Large language models have recently been applied for various time series tasks in diverse application domains.
Figure 2: Left: Taxonomy of LLMs for time series analysis (prompting, quantization, aligning which is further categorized into two groups as detailed in Figure \ref{['fig:alignment']}, vision as bridge, tool integration). For each category, key distinctions are drawn in comparison to the standard LLM pipeline shown at the top of the figure. Right: We present representative works for each category, sorted by their publication dates. The use of arrows indicates that later works build upon earlier studies. Dark(light)-colored boxes represent billion(million)-parameter models. Icons to the left of the text boxes represent the application domains of domain-specific models, with icons' meanings illustrated in Figure \ref{['fig:app']}.
Figure 3: Two types of index-based quantization methods.
Figure 4: Two types of aligning based methods.

Large Language Models for Time Series: A Survey

TL;DR

Abstract

Large Language Models for Time Series: A Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (4)