LLM-ABBA: Understanding time series via symbolic approximation
Xinye Chen, Erin Carson, Cheng Kang
TL;DR
LLM-ABBA proposes a unified pipeline that symbolizes numerical time series via ABBA and feeds the resulting symbols to pretrained LLMs with adapter-based fine-tuning. The approach achieves state-of-the-art performance on Time Series Extrinsic Regression and competitive results on several time series classification and forecasting benchmarks, aided by the fixed-point adaptive polygonal chain (FAPCA) to curb cumulative symbol-errors. The work demonstrates that symbolic representations can effectively transfer semantic knowledge to LLMs, enabling robust cross-domain time series analysis with practical hardware and code availability. While not surpassing all domain-specific SOTA baselines, LLM-ABBA introduces a scalable, multi-task framework that leverages LLMs’ reasoning and linguistic structure for time series tasks.
Abstract
The success of large language models (LLMs) for time series has been demonstrated in previous work. Utilizing a symbolic time series representation, one can efficiently bridge the gap between LLMs and time series. However, the remaining challenge is to exploit the semantic information hidden in time series by using symbols or existing tokens of LLMs, while aligning the embedding space of LLMs according to the hidden information of time series. The symbolic time series approximation (STSA) method called adaptive Brownian bridge-based symbolic aggregation (ABBA) shows outstanding efficacy in preserving salient time series features by modeling time series patterns in terms of amplitude and period while using existing tokens of LLMs. In this paper, we introduce a method, called LLM-ABBA, that integrates ABBA into large language models for various downstream time series tasks. By symbolizing time series, LLM-ABBA compares favorably to the recent state-of-the-art (SOTA) in UCR and three medical time series classification tasks. Meanwhile, a fixed-polygonal chain trick in ABBA is introduced to avoid obvious drifting during forecasting tasks by significantly mitigating the effects of cumulative error arising from misused symbols during the transition from symbols to numerical values. In time series regression tasks, LLM-ABBA achieves the new SOTA on Time Series Extrinsic Regression (TSER) benchmarks. LLM-ABBA also shows competitive forecasting capability compared to recent SOTA time series forecasting results. We believe this framework can also seamlessly extend to other time series tasks. Our simulation code is publicly available at: https://github.com/inEXASCALE/llm-abba
