Table of Contents
Fetching ...

Position: Empowering Time Series Reasoning with Multimodal LLMs

Yaxuan Kong, Yiyuan Yang, Shiyu Wang, Chenghao Liu, Yuxuan Liang, Ming Jin, Stefan Zohren, Dan Pei, Yan Liu, Qingsong Wen

TL;DR

The paper argues that time-series reasoning benefits from multimodal large language models (MLLMs) that fuse numerical data with text, images, audio, and external knowledge to enable deeper, context-aware inferences. It defines time-series reasoning, outlines a four-part framework (new paradigm, beyond classical tasks, resources and future directions), and proposes four model-design archetypes (zero-shot, one-stage, two-stage, multimodal) to support robust reasoning. It then introduces tasks beyond traditional forecasting—time-series question answering, causal inference, and generation/editing—demonstrating the practical potential across domains with contextual guidance and iterative feedback. The work also discusses datasets, evaluation metrics, training strategies, and key challenges (trust, interpretability, cost, confidentiality), calling for realistic benchmarks and open-source tools to realize these ideas in real-world settings.

Abstract

Understanding time series data is crucial for multiple real-world applications. While large language models (LLMs) show promise in time series tasks, current approaches often rely on numerical data alone, overlooking the multimodal nature of time-dependent information, such as textual descriptions, visual data, and audio signals. Moreover, these methods underutilize LLMs' reasoning capabilities, limiting the analysis to surface-level interpretations instead of deeper temporal and multimodal reasoning. In this position paper, we argue that multimodal LLMs (MLLMs) can enable more powerful and flexible reasoning for time series analysis, enhancing decision-making and real-world applications. We call on researchers and practitioners to leverage this potential by developing strategies that prioritize trust, interpretability, and robust reasoning in MLLMs. Lastly, we highlight key research directions, including novel reasoning paradigms, architectural innovations, and domain-specific applications, to advance time series reasoning with MLLMs.

Position: Empowering Time Series Reasoning with Multimodal LLMs

TL;DR

The paper argues that time-series reasoning benefits from multimodal large language models (MLLMs) that fuse numerical data with text, images, audio, and external knowledge to enable deeper, context-aware inferences. It defines time-series reasoning, outlines a four-part framework (new paradigm, beyond classical tasks, resources and future directions), and proposes four model-design archetypes (zero-shot, one-stage, two-stage, multimodal) to support robust reasoning. It then introduces tasks beyond traditional forecasting—time-series question answering, causal inference, and generation/editing—demonstrating the practical potential across domains with contextual guidance and iterative feedback. The work also discusses datasets, evaluation metrics, training strategies, and key challenges (trust, interpretability, cost, confidentiality), calling for realistic benchmarks and open-source tools to realize these ideas in real-world settings.

Abstract

Understanding time series data is crucial for multiple real-world applications. While large language models (LLMs) show promise in time series tasks, current approaches often rely on numerical data alone, overlooking the multimodal nature of time-dependent information, such as textual descriptions, visual data, and audio signals. Moreover, these methods underutilize LLMs' reasoning capabilities, limiting the analysis to surface-level interpretations instead of deeper temporal and multimodal reasoning. In this position paper, we argue that multimodal LLMs (MLLMs) can enable more powerful and flexible reasoning for time series analysis, enhancing decision-making and real-world applications. We call on researchers and practitioners to leverage this potential by developing strategies that prioritize trust, interpretability, and robust reasoning in MLLMs. Lastly, we highlight key research directions, including novel reasoning paradigms, architectural innovations, and domain-specific applications, to advance time series reasoning with MLLMs.

Paper Structure

This paper contains 45 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: MLLMs integrate multimodal time series and external knowledge, enhancing reasoning and expanding time-series tasks.
  • Figure 2: Key components for achieving time series reasoning (illustrated with financial time series example).
  • Figure 3: Different categories of advanced time series reasoning task and architectures.
  • Figure 4: Time series tasks in the age of time series reasoning.
  • Figure 5: Zero-shot open question performances of different LLMs and input settings for healthcare application.
  • ...and 2 more figures