Table of Contents
Fetching ...

Hierarchical Multimodal LLMs with Semantic Space Alignment for Enhanced Time Series Classification

Xiaoyu Tao, Tingyue Pan, Mingyue Cheng, Yucong Luo, Qi Liu, Enhong Chen

TL;DR

HiTime addresses the challenge of applying LLMs to time series classification by introducing a hierarchical temporal encoder, a dual-view semantic space alignment, and a parameter-efficient generative fine-tuning pipeline. The method bridges structured temporal dynamics with linguistic semantics, enabling LLMs to perform classification through generative prompts and subsequent keyword grounding. Extensive experiments on ten UEA datasets demonstrate strong performance gains and insights from ablations on encoders, alignment strategies, and prompt design. The work highlights practical trade-offs between semantic reasoning and local temporal cues, offering a scalable path for multimodal time series analysis with large language models.

Abstract

Time series classification plays a fundamental role in a wide range of real-world applications. Recently, large language models (LLMs) have demonstrated strong generalization and reasoning capacities, but directly applying them to time series classification remains non-trivial due to the representation gap between numerical sequences and linguistic semantics. In this paper, we propose HiTime, a hierarchical LLM-based framework for multimodal time series classification that bridges structured temporal representations with semantic reasoning in a generative paradigm. Specifically, we design a hierarchical sequence feature encoding module composed of a data-specific encoder and a task-specific encoder to extract complementary temporal features. To mitigate the embedding gap between time series representations and textual semantics, we further introduce a semantic space alignment module that jointly performs coarse-grained global modeling and fine-grained cross-modal correspondence. Building upon the above representations, we employ a parameter-efficient supervised fine-tuning strategy to activate the generative classification capability of the algined LLMs, thereby transforming conventional discriminative time series classification into a generative task. Extensive experiments on multiple benchmarks demonstrate that the proposed framework consistently outperforms state-of-the-art baselines. The code is publicly available at https://github.com/Xiaoyu-Tao/HiTime.

Hierarchical Multimodal LLMs with Semantic Space Alignment for Enhanced Time Series Classification

TL;DR

HiTime addresses the challenge of applying LLMs to time series classification by introducing a hierarchical temporal encoder, a dual-view semantic space alignment, and a parameter-efficient generative fine-tuning pipeline. The method bridges structured temporal dynamics with linguistic semantics, enabling LLMs to perform classification through generative prompts and subsequent keyword grounding. Extensive experiments on ten UEA datasets demonstrate strong performance gains and insights from ablations on encoders, alignment strategies, and prompt design. The work highlights practical trade-offs between semantic reasoning and local temporal cues, offering a scalable path for multimodal time series analysis with large language models.

Abstract

Time series classification plays a fundamental role in a wide range of real-world applications. Recently, large language models (LLMs) have demonstrated strong generalization and reasoning capacities, but directly applying them to time series classification remains non-trivial due to the representation gap between numerical sequences and linguistic semantics. In this paper, we propose HiTime, a hierarchical LLM-based framework for multimodal time series classification that bridges structured temporal representations with semantic reasoning in a generative paradigm. Specifically, we design a hierarchical sequence feature encoding module composed of a data-specific encoder and a task-specific encoder to extract complementary temporal features. To mitigate the embedding gap between time series representations and textual semantics, we further introduce a semantic space alignment module that jointly performs coarse-grained global modeling and fine-grained cross-modal correspondence. Building upon the above representations, we employ a parameter-efficient supervised fine-tuning strategy to activate the generative classification capability of the algined LLMs, thereby transforming conventional discriminative time series classification into a generative task. Extensive experiments on multiple benchmarks demonstrate that the proposed framework consistently outperforms state-of-the-art baselines. The code is publicly available at https://github.com/Xiaoyu-Tao/HiTime.

Paper Structure

This paper contains 34 sections, 8 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Illustration of the proposed HiTime framework for multimodal time series classification, in which hierarchical feature encoders extract temporal representations, semantic space alignment unifies time–text embeddings, and hybrid prompts enable generative instruct tuning with LLMs.
  • Figure 2: Illustration of the designed hybrid prompt template in HiTime.
  • Figure 3: Critical difference diagram over the mean ranks of HiTime, baseline methods.
  • Figure 4: Comparison of the effects of different prompt components on model accuracy across 10 datasets.
  • Figure 5: Few-shot performance evaluation of HiTime under varying training data ratios (20%–100%) across multiple time series classification benchmarks.
  • ...and 2 more figures