Table of Contents
Fetching ...

LLM-ABBA: Understanding time series via symbolic approximation

Xinye Chen, Erin Carson, Cheng Kang

TL;DR

LLM-ABBA proposes a unified pipeline that symbolizes numerical time series via ABBA and feeds the resulting symbols to pretrained LLMs with adapter-based fine-tuning. The approach achieves state-of-the-art performance on Time Series Extrinsic Regression and competitive results on several time series classification and forecasting benchmarks, aided by the fixed-point adaptive polygonal chain (FAPCA) to curb cumulative symbol-errors. The work demonstrates that symbolic representations can effectively transfer semantic knowledge to LLMs, enabling robust cross-domain time series analysis with practical hardware and code availability. While not surpassing all domain-specific SOTA baselines, LLM-ABBA introduces a scalable, multi-task framework that leverages LLMs’ reasoning and linguistic structure for time series tasks.

Abstract

The success of large language models (LLMs) for time series has been demonstrated in previous work. Utilizing a symbolic time series representation, one can efficiently bridge the gap between LLMs and time series. However, the remaining challenge is to exploit the semantic information hidden in time series by using symbols or existing tokens of LLMs, while aligning the embedding space of LLMs according to the hidden information of time series. The symbolic time series approximation (STSA) method called adaptive Brownian bridge-based symbolic aggregation (ABBA) shows outstanding efficacy in preserving salient time series features by modeling time series patterns in terms of amplitude and period while using existing tokens of LLMs. In this paper, we introduce a method, called LLM-ABBA, that integrates ABBA into large language models for various downstream time series tasks. By symbolizing time series, LLM-ABBA compares favorably to the recent state-of-the-art (SOTA) in UCR and three medical time series classification tasks. Meanwhile, a fixed-polygonal chain trick in ABBA is introduced to avoid obvious drifting during forecasting tasks by significantly mitigating the effects of cumulative error arising from misused symbols during the transition from symbols to numerical values. In time series regression tasks, LLM-ABBA achieves the new SOTA on Time Series Extrinsic Regression (TSER) benchmarks. LLM-ABBA also shows competitive forecasting capability compared to recent SOTA time series forecasting results. We believe this framework can also seamlessly extend to other time series tasks. Our simulation code is publicly available at: https://github.com/inEXASCALE/llm-abba

LLM-ABBA: Understanding time series via symbolic approximation

TL;DR

LLM-ABBA proposes a unified pipeline that symbolizes numerical time series via ABBA and feeds the resulting symbols to pretrained LLMs with adapter-based fine-tuning. The approach achieves state-of-the-art performance on Time Series Extrinsic Regression and competitive results on several time series classification and forecasting benchmarks, aided by the fixed-point adaptive polygonal chain (FAPCA) to curb cumulative symbol-errors. The work demonstrates that symbolic representations can effectively transfer semantic knowledge to LLMs, enabling robust cross-domain time series analysis with practical hardware and code availability. While not surpassing all domain-specific SOTA baselines, LLM-ABBA introduces a scalable, multi-task framework that leverages LLMs’ reasoning and linguistic structure for time series tasks.

Abstract

The success of large language models (LLMs) for time series has been demonstrated in previous work. Utilizing a symbolic time series representation, one can efficiently bridge the gap between LLMs and time series. However, the remaining challenge is to exploit the semantic information hidden in time series by using symbols or existing tokens of LLMs, while aligning the embedding space of LLMs according to the hidden information of time series. The symbolic time series approximation (STSA) method called adaptive Brownian bridge-based symbolic aggregation (ABBA) shows outstanding efficacy in preserving salient time series features by modeling time series patterns in terms of amplitude and period while using existing tokens of LLMs. In this paper, we introduce a method, called LLM-ABBA, that integrates ABBA into large language models for various downstream time series tasks. By symbolizing time series, LLM-ABBA compares favorably to the recent state-of-the-art (SOTA) in UCR and three medical time series classification tasks. Meanwhile, a fixed-polygonal chain trick in ABBA is introduced to avoid obvious drifting during forecasting tasks by significantly mitigating the effects of cumulative error arising from misused symbols during the transition from symbols to numerical values. In time series regression tasks, LLM-ABBA achieves the new SOTA on Time Series Extrinsic Regression (TSER) benchmarks. LLM-ABBA also shows competitive forecasting capability compared to recent SOTA time series forecasting results. We believe this framework can also seamlessly extend to other time series tasks. Our simulation code is publicly available at: https://github.com/inEXASCALE/llm-abba

Paper Structure

This paper contains 29 sections, 5 theorems, 12 equations, 10 figures, 9 tables.

Key Result

Theorem 3.1

Let $(\mu_i^{\text{len}}, \mu_i^{\text{inc}}) = \frac{1}{|S_i|} \sum_{(\text{len}, \text{inc}) \in S_i} (\text{len}, \text{inc})$, we denote the mean set for $\text{len}$ and $\text{inc}$ by $\mathcal{U}_{\text{len}}=\{\mu_i^{\text{len}}\}_{i=1}^{k}$ and $\mathcal{U}_{\text{inc}}=\{\mu_i^{\text{inc} where $(\widehat{\text{len}}_{\ell}, \widehat{\text{inc}}_{\ell})$ are the computed cluster centers

Figures (10)

  • Figure 1: The integration of time series and LLM demonstrates potential in solving complex real-world problems.
  • Figure 2: The left plot shows a sine function with 1,000 points, and the right plot shows the ECGFiveDays time series from the UCR Archive, which contains 136 points. We first perform fABBA with tol= 0.1 and $\alpha=0.1$。 Then, we perform SAX with approximately the same length of symbolic representation and the number of distinct symbols to the fABBA. In the sine plot, fABBA generates symbols "aBbCbCbCbCbCbCbCA" (17 symbols with 5 distinct symbols) while SAX generates symbols "aACBbaACBbaACBbaAABb" (20 symbols with 5 distinct symbols). In the ECGFiveDays plot, fABBA generates symbols "EAbACDBdAcaE" (12 symbols with 9 distinct symbols) while SAX generates symbols "AAAAAaBBCcADdEaabaaAAb" (22 symbols with 9 distinct symbols).
  • Figure 3: The model framework of LLM-ABBA: Given an input time series, we first transform and compress the time series to a symbolic series via ① and ①. These symbolic series will be tokenized by the LLM's tokenizer ②. The designed instruction that contains the symbolic series also will be tokenized by the LLM's tokenizer ②. Additionally, by only fine-tuning the pretrained LLM, the QLoRA with inhibition mechanism is utilized both in ③ and ③. To implement the corresponding tasks, ④ and ⑤ loads the LLM according to the type of task. However, ④ loads the LLM on the generation task. Moreover, to inverse symbolic series back to numerical time series, ⑥ and ⑤ utilizes ABBA to decompress the generated symbolic series. Lastly, in ⑦ and ⑥ the output time series from LLM-ABBA are projected to generate the forecasts.
  • Figure 4: We generate a synthetic trigonometric sine series of 1,000 points, and separately perform symbolic approximation with 4 symbols using APCA (left) and FAPCA (right) on the time series. ABBA with APCA and FAPCA generate symbols "aBbBbBbBbBbBbBbBA" and "abBbBbBbBbBbBbBbA", respectively, associated with their respective perturbed symbols, "abbBbBbBbBbBbBbBA" and "aBBbBbBbBbBbBbBbA". The symbol recovery is performed on correct symbols and perturbed symbols, respectively.
  • Figure 5: Frequency and rank of symbols in various UCR datasets.
  • ...and 5 more figures

Theorems & Definitions (6)

  • Theorem 3.1
  • Theorem 3.2: EG19b
  • Theorem 3.3
  • Theorem 3.4
  • Theorem 3.5
  • proof : Proof of Theorem \ref{['thm:deviate4']}