Table of Contents
Fetching ...

Joint symbolic aggregate approximation of time series

Xinye Chen

TL;DR

This work addresses the scalability and cross-series consistency limitations of state-of-the-art symbolic time-series representations, notably ABBA/fABBA. It introduces JABBA, a joint symbolic approximation framework that enforces symbolic consistency across multiple series and enables parallel compression, with two digitization strategies: JABBA (VQ) using vector quantization and JABBA (GA) using greedy aggregation. A key innovation is auto digitization, which derives the digitization parameter from the compression tolerance via Brownian-bridge modeling, yielding an error-bounded, non-parametric process. Empirical results on multivariate datasets and synthetic data show substantial speedups and competitive reconstruction accuracy, validating JABBA's practical impact for large-scale time-series analysis and enabling integration with natural language processing techniques through consistent symbolic representations. The approach broadens the applicability of symbolic time-series methods to high-throughput, multi-series contexts while maintaining reconstruction fidelity and interpretability.

Abstract

The increasing availability of temporal data poses a challenge to time-series and signal-processing domains due to its high numerosity and complexity. Symbolic representation outperforms raw data in a variety of engineering applications due to its storage efficiency, reduced numerosity, and noise reduction. The most recent symbolic aggregate approximation technique called ABBA demonstrates outstanding performance in preserving essential shape information of time series and enhancing the downstream applications. However, ABBA cannot handle multiple time series with consistent symbols, i.e., the same symbols from distinct time series are not identical. Also, working with appropriate ABBA digitization involves the tedious task of tuning the hyperparameters, such as the number of symbols or tolerance. Therefore, we present a joint symbolic aggregate approximation that has symbolic consistency, and show how the hyperparameter of digitization can itself be optimized alongside the compression tolerance ahead of time. Besides, we propose a novel computing paradigm that enables parallel computing of symbolic approximation. The extensive experiments demonstrate its superb performance and outstanding speed regarding symbolic approximation and reconstruction.

Joint symbolic aggregate approximation of time series

TL;DR

This work addresses the scalability and cross-series consistency limitations of state-of-the-art symbolic time-series representations, notably ABBA/fABBA. It introduces JABBA, a joint symbolic approximation framework that enforces symbolic consistency across multiple series and enables parallel compression, with two digitization strategies: JABBA (VQ) using vector quantization and JABBA (GA) using greedy aggregation. A key innovation is auto digitization, which derives the digitization parameter from the compression tolerance via Brownian-bridge modeling, yielding an error-bounded, non-parametric process. Empirical results on multivariate datasets and synthetic data show substantial speedups and competitive reconstruction accuracy, validating JABBA's practical impact for large-scale time-series analysis and enabling integration with natural language processing techniques through consistent symbolic representations. The approach broadens the applicability of symbolic time-series methods to high-throughput, multi-series contexts while maintaining reconstruction fidelity and interpretability.

Abstract

The increasing availability of temporal data poses a challenge to time-series and signal-processing domains due to its high numerosity and complexity. Symbolic representation outperforms raw data in a variety of engineering applications due to its storage efficiency, reduced numerosity, and noise reduction. The most recent symbolic aggregate approximation technique called ABBA demonstrates outstanding performance in preserving essential shape information of time series and enhancing the downstream applications. However, ABBA cannot handle multiple time series with consistent symbols, i.e., the same symbols from distinct time series are not identical. Also, working with appropriate ABBA digitization involves the tedious task of tuning the hyperparameters, such as the number of symbols or tolerance. Therefore, we present a joint symbolic aggregate approximation that has symbolic consistency, and show how the hyperparameter of digitization can itself be optimized alongside the compression tolerance ahead of time. Besides, we propose a novel computing paradigm that enables parallel computing of symbolic approximation. The extensive experiments demonstrate its superb performance and outstanding speed regarding symbolic approximation and reconstruction.
Paper Structure (15 sections, 2 theorems, 26 equations, 9 figures, 3 tables, 6 algorithms)

This paper contains 15 sections, 2 theorems, 26 equations, 9 figures, 3 tables, 6 algorithms.

Key Result

lemma 1

Given arbitrary data point $p$ (can be starting point) in group $S$, the mean center of $S$ is denoted by $\mu$, we have:

Figures (9)

  • Figure 1: 2-dimensional data partition using vector quantization achieved by k-means clustering and aggregation with 26 groups: The aggregation uses 0.025 seconds to finish the task while k-means uses 0.18 seconds. The dark points refer to starting point and centers in the two figures, respectively.
  • Figure 2: Image segmentation with VQ and GA: The three images are achieved by 2,332 and 678 clusters.
  • Figure 3: Performance comparison of k-means++ and sampling-based k-means.
  • Figure 4: Value of $\alpha$ as tol increases.
  • Figure 5: Fork and join model: since there is no dependency among compression tasks, the parallelism is easy to be executed with fork and join model.
  • ...and 4 more figures

Theorems & Definitions (2)

  • lemma 1
  • lemma 2: EG19b