Table of Contents
Fetching ...

TS-HTFA: Advancing Time Series Forecasting via Hierarchical Text-Free Alignment with Large Language Models

Pengfei Wang, Huanran Zheng, Qi'ao Xu, Silong Dai, Yiqiao Wang, Wenjing Yue, Wei Zhu, Tianwen Qian, Xiaoling Wang

TL;DR

TS-HTFA introduces a hierarchical, text-free alignment framework that leverages adaptive virtual text and a fixed language branch to guide time-series forecasting without paired textual annotations. By aligning input, intermediate, and output distributions through a dynamic adaptive gating module, layer-wise contrastive learning, and optimal transport loss, the approach effectively transfers language-model knowledge to time-series data. Empirical results on long- and short-term benchmarks demonstrate state-of-the-art performance and robust ablations confirm the contribution of each alignment component. The method reduces reliance on textual data, improves cross-modal coherence, and offers practical efficiency through LoRA-based fine-tuning and a scalable dual-tower design.

Abstract

Given the significant potential of large language models (LLMs) in sequence modeling, emerging studies have begun applying them to time-series forecasting. Despite notable progress, existing methods still face two critical challenges: 1) their reliance on large amounts of paired text data, limiting the model applicability, and 2) a substantial modality gap between text and time series, leading to insufficient alignment and suboptimal performance. In this paper, we introduce \textbf{H}ierarchical \textbf{T}ext-\textbf{F}ree \textbf{A}lignment (\textbf{TS-HTFA}), a novel method that leverages hierarchical alignment to fully exploit the representation capacity of LLMs while eliminating the dependence on text data. Specifically, we replace paired text data with adaptive virtual text based on QR decomposition word embeddings and learnable prompt. Furthermore, we establish comprehensive cross-modal alignment at three levels: input, feature, and output. Extensive experiments on multiple time-series benchmarks demonstrate that HTFA achieves state-of-the-art performance, significantly improving prediction accuracy and generalization.

TS-HTFA: Advancing Time Series Forecasting via Hierarchical Text-Free Alignment with Large Language Models

TL;DR

TS-HTFA introduces a hierarchical, text-free alignment framework that leverages adaptive virtual text and a fixed language branch to guide time-series forecasting without paired textual annotations. By aligning input, intermediate, and output distributions through a dynamic adaptive gating module, layer-wise contrastive learning, and optimal transport loss, the approach effectively transfers language-model knowledge to time-series data. Empirical results on long- and short-term benchmarks demonstrate state-of-the-art performance and robust ablations confirm the contribution of each alignment component. The method reduces reliance on textual data, improves cross-modal coherence, and offers practical efficiency through LoRA-based fine-tuning and a scalable dual-tower design.

Abstract

Given the significant potential of large language models (LLMs) in sequence modeling, emerging studies have begun applying them to time-series forecasting. Despite notable progress, existing methods still face two critical challenges: 1) their reliance on large amounts of paired text data, limiting the model applicability, and 2) a substantial modality gap between text and time series, leading to insufficient alignment and suboptimal performance. In this paper, we introduce \textbf{H}ierarchical \textbf{T}ext-\textbf{F}ree \textbf{A}lignment (\textbf{TS-HTFA}), a novel method that leverages hierarchical alignment to fully exploit the representation capacity of LLMs while eliminating the dependence on text data. Specifically, we replace paired text data with adaptive virtual text based on QR decomposition word embeddings and learnable prompt. Furthermore, we establish comprehensive cross-modal alignment at three levels: input, feature, and output. Extensive experiments on multiple time-series benchmarks demonstrate that HTFA achieves state-of-the-art performance, significantly improving prediction accuracy and generalization.
Paper Structure (32 sections, 13 equations, 13 figures, 6 tables)

This paper contains 32 sections, 13 equations, 13 figures, 6 tables.

Figures (13)

  • Figure S1: Comparison on LLMs for Time Series frameworks. (a) Single-Stream Models; (b) Two-Stream Models; (c) TS-HFTA (Ours).
  • Figure S2: Overview of the proposed TS-HFTA. The framework adopts a dual-branch structure, where the time series branch is hierarchically aligned with the fixed language branch across input, intermediate, and output distributions. During inference, the language branch is excluded, and predictions are generated solely from the time-series branch.
  • Figure S3: Overview of the proposed TS-GAVTG module for generating virtual paired text data. Time-series data is encoded into tokens and serves as the query in the cross-attention mechanism, with keys and values derived from QR-decomposed word embeddings and learnable prompts. Simultaneously, the self-attention mechanism processes the time series tokens independently. The gating mechanism fuses the outputs from both cross-attention and self-attention to generate virtual paired text tokens.
  • Figure : (1) Electricity Case: 96 $\to$ 96
  • Figure S5: Ablation on different reduction methods on (a) ETTh1 and (b) ETTm1 datasets.
  • ...and 8 more figures