Table of Contents
Fetching ...

STaTS: Structure-Aware Temporal Sequence Summarization via Statistical Window Merging

Disharee Bhowmick, Ranjith Ramanathan, Sathyanarayanan N. Aakur

TL;DR

STaTS proposes a structure-aware temporal summarization that compresses time series by detecting statistically coherent segments with a multi-scale BIC-based change detection and summarizing each segment with a simple token. This model-agnostic preprocessor yields BIC-guided tokens $ ilde{\boldsymbol{X}} \in \mathbb{R}^{T'\times d}$, enabling downstream encoders to operate on significantly shorter sequences without retraining, while preserving core temporal dynamics. Across univariate/multivariate classification and long-horizon forecasting on 150+ datasets, STaTS achieves about $85$–$90\%$ of full-model performance at roughly $10$–$30\times$ compression and improves robustness to noise relative to uniform and clustering-based baselines. These results demonstrate a scalable, principled approach to structure-aware time series modeling with practical impact for efficient learning on long or noisy sequences.

Abstract

Time series data often contain latent temporal structure, transitions between locally stationary regimes, repeated motifs, and bursts of variability, that are rarely leveraged in standard representation learning pipelines. Existing models typically operate on raw or fixed-window sequences, treating all time steps as equally informative, which leads to inefficiencies, poor robustness, and limited scalability in long or noisy sequences. We propose STaTS, a lightweight, unsupervised framework for Structure-Aware Temporal Summarization that adaptively compresses both univariate and multivariate time series into compact, information-preserving token sequences. STaTS detects change points across multiple temporal resolutions using a BIC-based statistical divergence criterion, then summarizes each segment using simple functions like the mean or generative models such as GMMs. This process achieves up to 30x sequence compression while retaining core temporal dynamics. STaTS operates as a model-agnostic preprocessor and can be integrated with existing unsupervised time series encoders without retraining. Extensive experiments on 150+ datasets, including classification tasks on the UCR-85, UCR-128, and UEA-30 archives, and forecasting on ETTh1 and ETTh2, ETTm1, and Electricity, demonstrate that STaTS enables 85-90\% of the full-model performance while offering dramatic reductions in computational cost. Moreover, STaTS improves robustness under noise and preserves discriminative structure, outperforming uniform and clustering-based compression baselines. These results position STaTS as a principled, general-purpose solution for efficient, structure-aware time series modeling.

STaTS: Structure-Aware Temporal Sequence Summarization via Statistical Window Merging

TL;DR

STaTS proposes a structure-aware temporal summarization that compresses time series by detecting statistically coherent segments with a multi-scale BIC-based change detection and summarizing each segment with a simple token. This model-agnostic preprocessor yields BIC-guided tokens , enabling downstream encoders to operate on significantly shorter sequences without retraining, while preserving core temporal dynamics. Across univariate/multivariate classification and long-horizon forecasting on 150+ datasets, STaTS achieves about of full-model performance at roughly compression and improves robustness to noise relative to uniform and clustering-based baselines. These results demonstrate a scalable, principled approach to structure-aware time series modeling with practical impact for efficient learning on long or noisy sequences.

Abstract

Time series data often contain latent temporal structure, transitions between locally stationary regimes, repeated motifs, and bursts of variability, that are rarely leveraged in standard representation learning pipelines. Existing models typically operate on raw or fixed-window sequences, treating all time steps as equally informative, which leads to inefficiencies, poor robustness, and limited scalability in long or noisy sequences. We propose STaTS, a lightweight, unsupervised framework for Structure-Aware Temporal Summarization that adaptively compresses both univariate and multivariate time series into compact, information-preserving token sequences. STaTS detects change points across multiple temporal resolutions using a BIC-based statistical divergence criterion, then summarizes each segment using simple functions like the mean or generative models such as GMMs. This process achieves up to 30x sequence compression while retaining core temporal dynamics. STaTS operates as a model-agnostic preprocessor and can be integrated with existing unsupervised time series encoders without retraining. Extensive experiments on 150+ datasets, including classification tasks on the UCR-85, UCR-128, and UEA-30 archives, and forecasting on ETTh1 and ETTh2, ETTm1, and Electricity, demonstrate that STaTS enables 85-90\% of the full-model performance while offering dramatic reductions in computational cost. Moreover, STaTS improves robustness under noise and preserves discriminative structure, outperforming uniform and clustering-based compression baselines. These results position STaTS as a principled, general-purpose solution for efficient, structure-aware time series modeling.

Paper Structure

This paper contains 24 sections, 6 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Visual overview of STaTS-based summarization. (Top) A multivariate time series with detected splits and true change points. (Bottom) The summarized sequence, where each token $S_i$ represents a segment. STaTS does not aim to recover true changes; segments are detected and summarized to capture structure relevant for downstream tasks.
  • Figure 2: Average normalized MSE across increasing forecast horizons on four multivariate datasets. TS2Vec (mean), using STaTS-based summarization, shows strong long-term performance and outperforms Informer, TCN, and LogTrans beyond 300 steps, despite using inputs compressed over 15×. LSTnet degrades significantly at longer horizons, while TS2Vec (ori) remains the strongest short-range model.
  • Figure 3: Qualitative forecasts at (a) short (H=24), (b) mid (H=168), (c) long (H=336), and (d) very long (H=720) horizons on the Electricity dataset. TS2Vec trained with STaTS (red) better tracks the ground truth (black) than the original TS2Vec (blue), particularly at longer horizons.
  • Figure 4: t-SNE visualization of learned representations on the GestureMidAirD3 dataset. Despite operating on highly compressed inputs, TS2Vec (mean) with STaTS (right) preserves clear class separation and compactness comparable to the original TS2Vec (left), suggesting that summarization retains task-relevant structure while reducing redundancy.