Joint symbolic aggregate approximation of time series
Xinye Chen
TL;DR
This work addresses the scalability and cross-series consistency limitations of state-of-the-art symbolic time-series representations, notably ABBA/fABBA. It introduces JABBA, a joint symbolic approximation framework that enforces symbolic consistency across multiple series and enables parallel compression, with two digitization strategies: JABBA (VQ) using vector quantization and JABBA (GA) using greedy aggregation. A key innovation is auto digitization, which derives the digitization parameter from the compression tolerance via Brownian-bridge modeling, yielding an error-bounded, non-parametric process. Empirical results on multivariate datasets and synthetic data show substantial speedups and competitive reconstruction accuracy, validating JABBA's practical impact for large-scale time-series analysis and enabling integration with natural language processing techniques through consistent symbolic representations. The approach broadens the applicability of symbolic time-series methods to high-throughput, multi-series contexts while maintaining reconstruction fidelity and interpretability.
Abstract
The increasing availability of temporal data poses a challenge to time-series and signal-processing domains due to its high numerosity and complexity. Symbolic representation outperforms raw data in a variety of engineering applications due to its storage efficiency, reduced numerosity, and noise reduction. The most recent symbolic aggregate approximation technique called ABBA demonstrates outstanding performance in preserving essential shape information of time series and enhancing the downstream applications. However, ABBA cannot handle multiple time series with consistent symbols, i.e., the same symbols from distinct time series are not identical. Also, working with appropriate ABBA digitization involves the tedious task of tuning the hyperparameters, such as the number of symbols or tolerance. Therefore, we present a joint symbolic aggregate approximation that has symbolic consistency, and show how the hyperparameter of digitization can itself be optimized alongside the compression tolerance ahead of time. Besides, we propose a novel computing paradigm that enables parallel computing of symbolic approximation. The extensive experiments demonstrate its superb performance and outstanding speed regarding symbolic approximation and reconstruction.
