Hierarchical Multimodal LLMs with Semantic Space Alignment for Enhanced Time Series Classification

Xiaoyu Tao; Tingyue Pan; Mingyue Cheng; Yucong Luo; Qi Liu; Enhong Chen

Hierarchical Multimodal LLMs with Semantic Space Alignment for Enhanced Time Series Classification

Xiaoyu Tao, Tingyue Pan, Mingyue Cheng, Yucong Luo, Qi Liu, Enhong Chen

TL;DR

HiTime addresses the challenge of applying LLMs to time series classification by introducing a hierarchical temporal encoder, a dual-view semantic space alignment, and a parameter-efficient generative fine-tuning pipeline. The method bridges structured temporal dynamics with linguistic semantics, enabling LLMs to perform classification through generative prompts and subsequent keyword grounding. Extensive experiments on ten UEA datasets demonstrate strong performance gains and insights from ablations on encoders, alignment strategies, and prompt design. The work highlights practical trade-offs between semantic reasoning and local temporal cues, offering a scalable path for multimodal time series analysis with large language models.

Abstract

Time series classification plays a fundamental role in a wide range of real-world applications. Recently, large language models (LLMs) have demonstrated strong generalization and reasoning capacities, but directly applying them to time series classification remains non-trivial due to the representation gap between numerical sequences and linguistic semantics. In this paper, we propose HiTime, a hierarchical LLM-based framework for multimodal time series classification that bridges structured temporal representations with semantic reasoning in a generative paradigm. Specifically, we design a hierarchical sequence feature encoding module composed of a data-specific encoder and a task-specific encoder to extract complementary temporal features. To mitigate the embedding gap between time series representations and textual semantics, we further introduce a semantic space alignment module that jointly performs coarse-grained global modeling and fine-grained cross-modal correspondence. Building upon the above representations, we employ a parameter-efficient supervised fine-tuning strategy to activate the generative classification capability of the algined LLMs, thereby transforming conventional discriminative time series classification into a generative task. Extensive experiments on multiple benchmarks demonstrate that the proposed framework consistently outperforms state-of-the-art baselines. The code is publicly available at https://github.com/Xiaoyu-Tao/HiTime.

Hierarchical Multimodal LLMs with Semantic Space Alignment for Enhanced Time Series Classification

TL;DR

Abstract

Hierarchical Multimodal LLMs with Semantic Space Alignment for Enhanced Time Series Classification

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)