CALF: Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning
Peiyuan Liu, Hang Guo, Tao Dai, Naiqi Li, Jigang Bao, Xudong Ren, Yong Jiang, Shu-Tao Xia
TL;DR
CALF tackles distribution mismatch between textual and time-series modalities in LLM-based MTSF by introducing a two-branch cross-modal fine-tuning framework. It couples a temporal target branch with a textual source branch via the Cross-Modal Match Module, Feature Regularization Loss, and Output Consistency Loss, and uses parameter-efficient training with LoRA and PCA-based synonym clustering to align inputs and outputs. The method achieves state-of-the-art results across long-term and short-term forecasting, as well as few-shot and zero-shot settings, with lower computational cost than prior LLM-based approaches. This work demonstrates that multi-level cross-modal alignment can unlock robust generalization and practical applicability of LLMs in data-scarce time-series forecasting scenarios.
Abstract
Deep learning (e.g., Transformer) has been widely and successfully used in multivariate time series forecasting (MTSF). Unlike existing methods that focus on training models from a single modal of time series input, large language models (LLMs) based MTSF methods with cross-modal text and time series input have recently shown great superiority, especially with limited temporal data. However, current LLM-based MTSF methods usually focus on adapting and fine-tuning LLMs, while neglecting the distribution discrepancy between textual and temporal input tokens, thus leading to sub-optimal performance. To address this issue, we propose a novel Cross-Modal LLM Fine-Tuning (CALF) framework for MTSF by reducing the distribution discrepancy between textual and temporal data, which mainly consists of the temporal target branch with temporal input and the textual source branch with aligned textual input. To reduce the distribution discrepancy, we develop the cross-modal match module to first align cross-modal input distributions. Additionally, to minimize the modality distribution gap in both feature and output spaces, feature regularization loss is developed to align the intermediate features between the two branches for better weight updates, while output consistency loss is introduced to allow the output representations of both branches to correspond effectively. Thanks to the modality alignment, CALF establishes state-of-the-art performance for both long-term and short-term forecasting tasks with low computational complexity, and exhibiting favorable few-shot and zero-shot abilities similar to that in LLMs. Code is available at https://github.com/Hank0626/LLaTA.
