Understanding Why Large Language Models Can Be Ineffective in Time Series Analysis: The Impact of Modality Alignment
Liangwei Nathan Zheng, Chang George Dong, Wei Emma Zhang, Lin Yue, Miao Xu, Olaf Maennel, Weitong Chen
TL;DR
The study critically evaluates the use of Large Language Models for core time series tasks, finding that LLMs provide little to no advantage over simple linear baselines and can distort temporal structure. By dissecting reprogramming techniques and modality alignment through experiments and data-manifold analyses, the authors show that observed gains originate from intrinsic time-series characteristics rather than language knowledge. They demonstrate a pervasive issue of pseudo-alignment, where alignment collapses to centroids rather than genuine manifold-level correspondence between TS and language. As a preliminary remedy, they propose a Mixer-based approach to implicitly fuse time-series tokens with semantic text to mitigate pseudo-alignment, suggesting a promising path for future multimodal time-series reprogramming research. The work has practical implications for deploying TS models that balance performance and computational efficiency, cautioning against over-reliance on LLMs for time-series tasks.
Abstract
Large Language Models (LLMs) have demonstrated impressive performance in time series analysis and seems to understand the time temporal relationship well than traditional transformer-based approaches. However, since LLMs are not designed for time series tasks, simpler models like linear regressions can often achieve comparable performance with far less complexity. In this study, we perform extensive experiments to assess the effectiveness of applying LLMs to key time series tasks, including forecasting, classification, imputation, and anomaly detection. We compare the performance of LLMs against simpler baseline models, such as single layer linear models and randomly initialized LLMs. Our results reveal that LLMs offer minimal advantages for these core time series tasks and may even distort the temporal structure of the data. In contrast, simpler models consistently outperform LLMs while requiring far fewer parameters. Furthermore, we analyze existing reprogramming techniques and show, through data manifold analysis, that these methods fail to effectively align time series data with language and display "pseudo-alignment" behavior in embedding space. Our findings suggest that the performance of LLM based methods in time series tasks arises from the intrinsic characteristics and structure of time series data, rather than any meaningful alignment with the language model architecture.
