Table of Contents
Fetching ...

Hi-WaveTST: A Hybrid High-Frequency Wavelet-Transformer for Time-Series Classification

Huseyin Goksu

TL;DR

Time-series classification with Transformers often treats patches as independent tokens, overlooking high-frequency information. Hi-WaveTST rectifies this by augmenting the temporal patch stream with a parallel Wavelet Feature Stream built from deep Wavelet Packet Decomposition and learnable Generalized Mean pooling, producing a compact high-frequency representation that complements temporal dynamics. On the UCI-HAR HAR benchmark, Hi-WaveTST achieves state-of-the-art accuracy and robustness, with an extensive ablation study showing the necessity of the hybrid architecture, deep high-frequency features, the db2 wavelet basis, and GeM pooling. This hybrid fusion of time-domain tokens and time-frequency features points to a practical approach for improved performance in HAR and other time-series domains, leveraging both modern deep learning and classical signal-processing tools.

Abstract

Transformers have become state-of-the-art (SOTA) for time-series classification, with models like PatchTST demonstrating exceptional performance. These models rely on patching the time series and learning relationships between raw temporal data blocks. We argue that this approach is blind to critical, non-obvious high-frequency information that is complementary to the temporal dynamics. In this letter, we propose Hi-WaveTST, a novel Hybrid architecture that augments the original temporal patch with a learnable, High-Frequency wavelet feature stream. Our wavelet stream uses a deep Wavelet Packet Decomposition (WPD) on each patch and extracts features using a learnable Generalized Mean (GeM) pooling layer. On the UCI-HAR benchmark dataset, our hybrid model achieves a mean accuracy of 93.38 percent plus-minus 0.0043, significantly outperforming the SOTA PatchTST baseline (92.59 percent plus-minus 0.0039). A comprehensive ablation study proves that every component of our design-the hybrid architecture, the deep high-frequency wavelet decomposition, and the learnable GeM pooling-is essential for this state-of-the-art performance.

Hi-WaveTST: A Hybrid High-Frequency Wavelet-Transformer for Time-Series Classification

TL;DR

Time-series classification with Transformers often treats patches as independent tokens, overlooking high-frequency information. Hi-WaveTST rectifies this by augmenting the temporal patch stream with a parallel Wavelet Feature Stream built from deep Wavelet Packet Decomposition and learnable Generalized Mean pooling, producing a compact high-frequency representation that complements temporal dynamics. On the UCI-HAR HAR benchmark, Hi-WaveTST achieves state-of-the-art accuracy and robustness, with an extensive ablation study showing the necessity of the hybrid architecture, deep high-frequency features, the db2 wavelet basis, and GeM pooling. This hybrid fusion of time-domain tokens and time-frequency features points to a practical approach for improved performance in HAR and other time-series domains, leveraging both modern deep learning and classical signal-processing tools.

Abstract

Transformers have become state-of-the-art (SOTA) for time-series classification, with models like PatchTST demonstrating exceptional performance. These models rely on patching the time series and learning relationships between raw temporal data blocks. We argue that this approach is blind to critical, non-obvious high-frequency information that is complementary to the temporal dynamics. In this letter, we propose Hi-WaveTST, a novel Hybrid architecture that augments the original temporal patch with a learnable, High-Frequency wavelet feature stream. Our wavelet stream uses a deep Wavelet Packet Decomposition (WPD) on each patch and extracts features using a learnable Generalized Mean (GeM) pooling layer. On the UCI-HAR benchmark dataset, our hybrid model achieves a mean accuracy of 93.38 percent plus-minus 0.0043, significantly outperforming the SOTA PatchTST baseline (92.59 percent plus-minus 0.0039). A comprehensive ablation study proves that every component of our design-the hybrid architecture, the deep high-frequency wavelet decomposition, and the learnable GeM pooling-is essential for this state-of-the-art performance.

Paper Structure

This paper contains 17 sections, 1 equation, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Model Architecture Comparison. (A) The baseline PatchTST model, which uses only raw temporal patches. (B) Our proposed Hi-WaveTST, which features a dual stream. It concatenates the raw temporal patch (Temporal Stream) with a new wavelet feature token (Wavelet Stream) derived from WPD and learnable GeM pooling.
  • Figure 2: Main results and ablation study. Our champion model, 'Hi-WaveTST (L3, db2, GeM)', achieves the highest mean accuracy, outperforming the baseline and all ablation variants.
  • Figure 3: Final learned $p$-values for the 8 GeM pooling layers in our champion model ('Hi-Wave (L3, db2, GeM)'). The values consistently converge near $p=3.0$, indicating a learned preference for a non-linear pooling strategy over simple averaging ($p=1$).