Table of Contents
Fetching ...

How Can Time Series Analysis Benefit From Multiple Modalities? A Survey and Outlook

Haoxin Liu, Harshavardhan Kamarthi, Zhiyuan Zhao, Shangqing Xu, Shiyu Wang, Qingsong Wen, Tom Hartvigsen, Fei Wang, B. Aditya Prakash

TL;DR

This paper presents the first comprehensive survey of Multiple Modalities for Time Series Analysis (MM4TSA), outlining three core approaches: TimeAsX (reusing foundation models from other modalities), Time+X (multimodal extensions), and Time2X/X2Time (cross-modality interaction). It categorizes work by modality (text, image, audio, table) and domain (finance, medicine, spatial-temporal), discusses practical datasets and fusion strategies, and identifies key gaps such as modality selection, heterogeneous integration, and unseen-task generalization. The authors propose benchmarks and reasoning-based methods to advance the field and provide an up-to-date GitHub resource with papers and datasets. Overall, the survey highlights the value of leveraging multi-modal information to enhance TSA performance, interpretability, and applicability across diverse domains.

Abstract

Time series analysis (TSA) is a longstanding research topic in the data mining community and has wide real-world significance. Compared to "richer" modalities such as language and vision, which have recently experienced explosive development and are densely connected, the time-series modality remains relatively underexplored and isolated. We notice that many recent TSA works have formed a new research field, i.e., Multiple Modalities for TSA (MM4TSA). In general, these MM4TSA works follow a common motivation: how TSA can benefit from multiple modalities. This survey is the first to offer a comprehensive review and a detailed outlook for this emerging field. Specifically, we systematically discuss three benefits: (1) reusing foundation models of other modalities for efficient TSA, (2) multimodal extension for enhanced TSA, and (3) cross-modality interaction for advanced TSA. We further group the works by the introduced modality type, including text, images, audio, tables, and others, within each perspective. Finally, we identify the gaps with future opportunities, including the reused modalities selections, heterogeneous modality combinations, and unseen tasks generalizations, corresponding to the three benefits. We release an up-to-date GitHub repository that includes key papers and resources.

How Can Time Series Analysis Benefit From Multiple Modalities? A Survey and Outlook

TL;DR

This paper presents the first comprehensive survey of Multiple Modalities for Time Series Analysis (MM4TSA), outlining three core approaches: TimeAsX (reusing foundation models from other modalities), Time+X (multimodal extensions), and Time2X/X2Time (cross-modality interaction). It categorizes work by modality (text, image, audio, table) and domain (finance, medicine, spatial-temporal), discusses practical datasets and fusion strategies, and identifies key gaps such as modality selection, heterogeneous integration, and unseen-task generalization. The authors propose benchmarks and reasoning-based methods to advance the field and provide an up-to-date GitHub resource with papers and datasets. Overall, the survey highlights the value of leveraging multi-modal information to enhance TSA performance, interpretability, and applicability across diverse domains.

Abstract

Time series analysis (TSA) is a longstanding research topic in the data mining community and has wide real-world significance. Compared to "richer" modalities such as language and vision, which have recently experienced explosive development and are densely connected, the time-series modality remains relatively underexplored and isolated. We notice that many recent TSA works have formed a new research field, i.e., Multiple Modalities for TSA (MM4TSA). In general, these MM4TSA works follow a common motivation: how TSA can benefit from multiple modalities. This survey is the first to offer a comprehensive review and a detailed outlook for this emerging field. Specifically, we systematically discuss three benefits: (1) reusing foundation models of other modalities for efficient TSA, (2) multimodal extension for enhanced TSA, and (3) cross-modality interaction for advanced TSA. We further group the works by the introduced modality type, including text, images, audio, tables, and others, within each perspective. Finally, we identify the gaps with future opportunities, including the reused modalities selections, heterogeneous modality combinations, and unseen tasks generalizations, corresponding to the three benefits. We release an up-to-date GitHub repository that includes key papers and resources.

Paper Structure

This paper contains 67 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Illustration of three approaches of multiple modalities for TSA (MM4TSA). These three MM4TSA approaches empower TSA from modality reusing, multimodal enhancement, and cross-modal interaction, respectively.
  • Figure 2: A comprehensive taxonomy of MM4TSA. Multiple Modalities For TSA (MM4TSA) is organized into four stages, starting with the benefit approaches (i.e., TimeAsX, Time+X, Time2X & X2Time), followed by modality types (i.e., text, image, audio and table, if available), domain-specific applications (i.e., financial, medical, spatial-temporal TSA) and finally, the gaps and outlooks. For branches with extensive existing research, especially Time As Text (§\ref{['sec:timeAsText']}), Time As Image (§\ref{['sec:timeAsImage']}), and Time + Text (§\ref{['sec:timeWithText']}), we further divide them into more detailed subcategories.
  • Figure 3: Taxonomy of Modality Fusion Solutions.