Table of Contents
Fetching ...

Content-aware Balanced Spectrum Encoding in Masked Modeling for Time Series Classification

Yudong Han, Haocong Wang, Yupeng Hu, Yongshun Gong, Xuemeng Song, Weili Guan

TL;DR

This paper tackles two shortcomings of transformer-based masked time-series modeling for classification: feature homogenization due to long-range interaction and energy bias toward low frequencies. It introduces a Content-aware Balanced Decoder (CBD) with two units—Content-aware Interaction Modulation (CIM) and Spectrum Energy Rebalance (SER)—to refine spectrum-space encoding through dynamic, content-driven interactions and Bernstein-polynomial energy rebalancing, guided by a dual-constraint training objective. The approach is instantiated as an iterative CBD across layers and demonstrated on ten diverse datasets, yielding state-of-the-art or near-state-of-the-art results in both linear evaluation and fine-tuning, with thorough analyses of CIM, SER and CBD generality. The results indicate that aligning temporal and spectral representations during pretraining enhances generalization and interpretability, offering a practical path to more robust time-series representations in real-world tasks.

Abstract

Due to the superior ability of global dependency, transformer and its variants have become the primary choice in Masked Time-series Modeling (MTM) towards time-series classification task. In this paper, we experimentally analyze that existing transformer-based MTM methods encounter with two under-explored issues when dealing with time series data: (1) they encode features by performing long-dependency ensemble averaging, which easily results in rank collapse and feature homogenization as the layer goes deeper; (2) they exhibit distinct priorities in fitting different frequency components contained in the time-series, inevitably leading to spectrum energy imbalance of encoded feature. To tackle these issues, we propose an auxiliary content-aware balanced decoder (CBD) to optimize the encoding quality in the spectrum space within masked modeling scheme. Specifically, the CBD iterates on a series of fundamental blocks, and thanks to two tailored units, each block could progressively refine the masked representation via adjusting the interaction pattern based on local content variations of time-series and learning to recalibrate the energy distribution across different frequency components. Moreover, a dual-constraint loss is devised to enhance the mutual optimization of vanilla decoder and our CBD. Extensive experimental results on ten time-series classification datasets show that our method nearly surpasses a bunch of baselines. Meanwhile, a series of explanatory results are showcased to sufficiently demystify the behaviors of our method.

Content-aware Balanced Spectrum Encoding in Masked Modeling for Time Series Classification

TL;DR

This paper tackles two shortcomings of transformer-based masked time-series modeling for classification: feature homogenization due to long-range interaction and energy bias toward low frequencies. It introduces a Content-aware Balanced Decoder (CBD) with two units—Content-aware Interaction Modulation (CIM) and Spectrum Energy Rebalance (SER)—to refine spectrum-space encoding through dynamic, content-driven interactions and Bernstein-polynomial energy rebalancing, guided by a dual-constraint training objective. The approach is instantiated as an iterative CBD across layers and demonstrated on ten diverse datasets, yielding state-of-the-art or near-state-of-the-art results in both linear evaluation and fine-tuning, with thorough analyses of CIM, SER and CBD generality. The results indicate that aligning temporal and spectral representations during pretraining enhances generalization and interpretability, offering a practical path to more robust time-series representations in real-world tasks.

Abstract

Due to the superior ability of global dependency, transformer and its variants have become the primary choice in Masked Time-series Modeling (MTM) towards time-series classification task. In this paper, we experimentally analyze that existing transformer-based MTM methods encounter with two under-explored issues when dealing with time series data: (1) they encode features by performing long-dependency ensemble averaging, which easily results in rank collapse and feature homogenization as the layer goes deeper; (2) they exhibit distinct priorities in fitting different frequency components contained in the time-series, inevitably leading to spectrum energy imbalance of encoded feature. To tackle these issues, we propose an auxiliary content-aware balanced decoder (CBD) to optimize the encoding quality in the spectrum space within masked modeling scheme. Specifically, the CBD iterates on a series of fundamental blocks, and thanks to two tailored units, each block could progressively refine the masked representation via adjusting the interaction pattern based on local content variations of time-series and learning to recalibrate the energy distribution across different frequency components. Moreover, a dual-constraint loss is devised to enhance the mutual optimization of vanilla decoder and our CBD. Extensive experimental results on ten time-series classification datasets show that our method nearly surpasses a bunch of baselines. Meanwhile, a series of explanatory results are showcased to sufficiently demystify the behaviors of our method.

Paper Structure

This paper contains 41 sections, 2 theorems, 22 equations, 9 figures, 7 tables.

Key Result

Theorem 1

(Frequency-domain convolution theorem) The multiplication of two signals in the Fourier domain equals to Fourier transformation of a convolution of these two signals in temporal domain, which can be summarized as, where $\otimes$ and $\odot$ denote the convolutional operation and element-multiplication operation, respectively, $\mathbf{K}(t)$ and $\mathbf{Z}(t)$ represent two signals with respect

Figures (9)

  • Figure 1: (a) Comparison of rank of the interaction matrix across different layers of the encoder in vanilla MTM and our method; (b) Energy distribution comparison of raw data, reconstructed results of vanilla MTM, and that of our method.
  • Figure 2: (a) Schematic illustration of our two-pronged framework. TED denotes temporal encoder, and TD and CBD represents temporal decoder and frequency decoder (content-aware balanced decoder). (b) The details of content-aware balanced decoder.
  • Figure 3: Learned Interaction Matrix on the HAR dataset.
  • Figure 4: Energy rebalance of SER. The top row depicts the modulation during the energy rebalancing, while the bottom represents the corresponding learned Bernstein polynomials.
  • Figure 5: Learned Interaction Matrix on the PhonemeSpectra dataset.
  • ...and 4 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2
  • Definition 1