Table of Contents
Fetching ...

Efficient High-Resolution Time Series Classification via Attention Kronecker Decomposition

Aosong Feng, Jialin Chen, Juan Garza, Brooklyn Berry, Francisco Salazar, Yifeng Gao, Rex Ying, Leandros Tassiulas

TL;DR

High-resolution time series classification faces challenges from quadratic attention costs and noise in very long sequences. KronTime introduces hierarchical time series encoding and Kronecker-decomposed attention to capture multi-scale dependencies with reduced complexity, transforming attention from $O(n^2)$ toward $O(n \log n)$. Implemented on a PatchTST backbone, KronTime maintains competitive accuracy while delivering substantial efficiency gains on four long datasets. This approach enables scalable, robust classification of long, high-fidelity temporal data in real-world applications.

Abstract

The high-resolution time series classification problem is essential due to the increasing availability of detailed temporal data in various domains. To tackle this challenge effectively, it is imperative that the state-of-the-art attention model is scalable to accommodate the growing sequence lengths typically encountered in high-resolution time series data, while also demonstrating robustness in handling the inherent noise prevalent in such datasets. To address this, we propose to hierarchically encode the long time series into multiple levels based on the interaction ranges. By capturing relationships at different levels, we can build more robust, expressive, and efficient models that are capable of capturing both short-term fluctuations and long-term trends in the data. We then propose a new time series transformer backbone (KronTime) by introducing Kronecker-decomposed attention to process such multi-level time series, which sequentially calculates attention from the lower level to the upper level. Experiments on four long time series datasets demonstrate superior classification results with improved efficiency compared to baseline methods.

Efficient High-Resolution Time Series Classification via Attention Kronecker Decomposition

TL;DR

High-resolution time series classification faces challenges from quadratic attention costs and noise in very long sequences. KronTime introduces hierarchical time series encoding and Kronecker-decomposed attention to capture multi-scale dependencies with reduced complexity, transforming attention from toward . Implemented on a PatchTST backbone, KronTime maintains competitive accuracy while delivering substantial efficiency gains on four long datasets. This approach enables scalable, robust classification of long, high-fidelity temporal data in real-world applications.

Abstract

The high-resolution time series classification problem is essential due to the increasing availability of detailed temporal data in various domains. To tackle this challenge effectively, it is imperative that the state-of-the-art attention model is scalable to accommodate the growing sequence lengths typically encountered in high-resolution time series data, while also demonstrating robustness in handling the inherent noise prevalent in such datasets. To address this, we propose to hierarchically encode the long time series into multiple levels based on the interaction ranges. By capturing relationships at different levels, we can build more robust, expressive, and efficient models that are capable of capturing both short-term fluctuations and long-term trends in the data. We then propose a new time series transformer backbone (KronTime) by introducing Kronecker-decomposed attention to process such multi-level time series, which sequentially calculates attention from the lower level to the upper level. Experiments on four long time series datasets demonstrate superior classification results with improved efficiency compared to baseline methods.
Paper Structure (13 sections, 4 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 13 sections, 4 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: (a) The long time series after patchified can be decomposed into multiple levels. The first, second, and third level encodes adjacent, mid-range, and long-range global information, respectively (b) Input sequences $\mathbf{q}, \mathbf{k}, \mathbf{v}$ are first tensorized into $\bm{\mathcal{Q}}, \bm{\mathcal{K}}, \bm{\mathcal{V}}$. Each row in the middle represents the attention along one matching dimension of tensors, and all dimensions except the matching dimension of $\bm{\mathcal{Q}}$ and $\bm{\mathcal{K}}$ are flattened. The result from each row is used to sequentially update the value tensor $\bm{\mathcal{V}}$.
  • Figure 3: Comparison of running time and GPU memory usage with different input lengths.
  • Figure 4: The validation accuracy with different Kronecker decomposition strategies (upper: number of levels decomposed; lower: different decomposition with 2 levels) during the training phase.