Table of Contents
Fetching ...

Exploring the Effectiveness and Interpretability of Texts in LLM-based Time Series Models

Zhengke Sun, Hangwei Qian, Ivor Tsang

TL;DR

The paper examines whether textual modalities embedded in TS-LLM frameworks improve forecasting and interpretability. Through extensive experiments on TimeLLM and CALF across multiple datasets and language models, it finds that textual inputs often do not yield consistent forecasting gains and that interpretability remains limited due to misalignment between time-series signals and textual representations. A new Semantic Matching Index (SMI) is introduced to quantify the semantic alignment between patches and text token sets, revealing model- and method-dependent misalignment. The work emphasizes the need for more robust cross-modal alignment and interpretability strategies in TS-LLMs and provides code for reproducibility across long-term, few-shot, and zero-shot forecasting scenarios.

Abstract

Large Language Models (LLMs) have been applied to time series forecasting tasks, leveraging pre-trained language models as the backbone and incorporating textual data to purportedly enhance the comprehensive capabilities of LLMs for time series. However, are these texts really helpful for interpretation? This study seeks to investigate the actual efficacy and interpretability of such textual incorporations. Through a series of empirical experiments on textual prompts and textual prototypes, our findings reveal that the misalignment between two modalities exists, and the textual information does not significantly improve time series forecasting performance in many cases. Furthermore, visualization analysis indicates that the textual representations learned by existing frameworks lack sufficient interpretability when applied to time series data. We further propose a novel metric named Semantic Matching Index (SMI) to better evaluate the matching degree between time series and texts during our post hoc interpretability investigation. Our analysis reveals the misalignment and limited interpretability of texts in current time-series LLMs, and we hope this study can raise awareness of the interpretability of texts for time series. The code is available at https://github.com/zachysun/TS-Lang-Exp.

Exploring the Effectiveness and Interpretability of Texts in LLM-based Time Series Models

TL;DR

The paper examines whether textual modalities embedded in TS-LLM frameworks improve forecasting and interpretability. Through extensive experiments on TimeLLM and CALF across multiple datasets and language models, it finds that textual inputs often do not yield consistent forecasting gains and that interpretability remains limited due to misalignment between time-series signals and textual representations. A new Semantic Matching Index (SMI) is introduced to quantify the semantic alignment between patches and text token sets, revealing model- and method-dependent misalignment. The work emphasizes the need for more robust cross-modal alignment and interpretability strategies in TS-LLMs and provides code for reproducibility across long-term, few-shot, and zero-shot forecasting scenarios.

Abstract

Large Language Models (LLMs) have been applied to time series forecasting tasks, leveraging pre-trained language models as the backbone and incorporating textual data to purportedly enhance the comprehensive capabilities of LLMs for time series. However, are these texts really helpful for interpretation? This study seeks to investigate the actual efficacy and interpretability of such textual incorporations. Through a series of empirical experiments on textual prompts and textual prototypes, our findings reveal that the misalignment between two modalities exists, and the textual information does not significantly improve time series forecasting performance in many cases. Furthermore, visualization analysis indicates that the textual representations learned by existing frameworks lack sufficient interpretability when applied to time series data. We further propose a novel metric named Semantic Matching Index (SMI) to better evaluate the matching degree between time series and texts during our post hoc interpretability investigation. Our analysis reveals the misalignment and limited interpretability of texts in current time-series LLMs, and we hope this study can raise awareness of the interpretability of texts for time series. The code is available at https://github.com/zachysun/TS-Lang-Exp.

Paper Structure

This paper contains 34 sections, 3 equations, 16 figures, 28 tables.

Figures (16)

  • Figure 1: Overview of TimeLLM and CALF
  • Figure 2: Similarity between text prototypes and selected words in TimeLLM.
  • Figure 3: Attention of cross-modality alignment module in TimeLLM.
  • Figure 4: Similarity between aligned time series embeddings and selected words.
  • Figure 5: Patches belong to token set ‘Gbeetle’. The bold black line indicates the mean value, and the blue area represents the range within one standard deviation. More tokens sets refer to Fig.\ref{['fig:v4-gpt2']} and \ref{['fig:v4-bert']} in Appendic C.4.
  • ...and 11 more figures