Table of Contents
Fetching ...

VSFormer: Value and Shape-Aware Transformer with Prior-Enhanced Self-Attention for Multivariate Time Series Classification

Wenjie Xi, Rundong Zuo, Alejandro Alvarez, Jie Zhang, Byron Choi, Jessica Lin

TL;DR

VSFormer introduces a value-and-shape-aware Transformer for multivariate time series classification. It jointly processes shape tokens (discriminative patterns) and value tokens (statistical features) with class-specific priors embedded in both encoding and attention via Time Series Information Encoding (TSI) and Prior-Enhanced Self-Attention (PESA). The model achieves state-of-the-art performance across all 30 UEA MTSC datasets and demonstrates strong robustness in datasets lacking clear discriminative patterns, as shown by a solar-flare case study. The approach offers interpretable token-level insights through prioritized shape motifs and interval-based value features, with potential impact on practical MTSC tasks where discriminative patterns are weak or absent.

Abstract

Multivariate time series classification is a crucial task in data mining, attracting growing research interest due to its broad applications. While many existing methods focus on discovering discriminative patterns in time series, real-world data does not always present such patterns, and sometimes raw numerical values can also serve as discriminative features. Additionally, the recent success of Transformer models has inspired many studies. However, when applying to time series classification, the self-attention mechanisms in Transformer models could introduce classification-irrelevant features, thereby compromising accuracy. To address these challenges, we propose a novel method, VSFormer, that incorporates both discriminative patterns (shape) and numerical information (value). In addition, we extract class-specific prior information derived from supervised information to enrich the positional encoding and provide classification-oriented self-attention learning, thereby enhancing its effectiveness. Extensive experiments on all 30 UEA archived datasets demonstrate the superior performance of our method compared to SOTA models. Through ablation studies, we demonstrate the effectiveness of the improved encoding layer and the proposed self-attention mechanism. Finally, We provide a case study on a real-world time series dataset without discriminative patterns to interpret our model.

VSFormer: Value and Shape-Aware Transformer with Prior-Enhanced Self-Attention for Multivariate Time Series Classification

TL;DR

VSFormer introduces a value-and-shape-aware Transformer for multivariate time series classification. It jointly processes shape tokens (discriminative patterns) and value tokens (statistical features) with class-specific priors embedded in both encoding and attention via Time Series Information Encoding (TSI) and Prior-Enhanced Self-Attention (PESA). The model achieves state-of-the-art performance across all 30 UEA MTSC datasets and demonstrates strong robustness in datasets lacking clear discriminative patterns, as shown by a solar-flare case study. The approach offers interpretable token-level insights through prioritized shape motifs and interval-based value features, with potential impact on practical MTSC tasks where discriminative patterns are weak or absent.

Abstract

Multivariate time series classification is a crucial task in data mining, attracting growing research interest due to its broad applications. While many existing methods focus on discovering discriminative patterns in time series, real-world data does not always present such patterns, and sometimes raw numerical values can also serve as discriminative features. Additionally, the recent success of Transformer models has inspired many studies. However, when applying to time series classification, the self-attention mechanisms in Transformer models could introduce classification-irrelevant features, thereby compromising accuracy. To address these challenges, we propose a novel method, VSFormer, that incorporates both discriminative patterns (shape) and numerical information (value). In addition, we extract class-specific prior information derived from supervised information to enrich the positional encoding and provide classification-oriented self-attention learning, thereby enhancing its effectiveness. Extensive experiments on all 30 UEA archived datasets demonstrate the superior performance of our method compared to SOTA models. Through ablation studies, we demonstrate the effectiveness of the improved encoding layer and the proposed self-attention mechanism. Finally, We provide a case study on a real-world time series dataset without discriminative patterns to interpret our model.

Paper Structure

This paper contains 28 sections, 15 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: The overall architecture of VSFormer
  • Figure 2: The shape token generation process (with $k=1$ for clarity). Steps: ① Concatenate sequences per class by variable and identify repeated patterns (highlighted in red and orange). ② Extract prototype shapes. ③ Conduct a similarity search for each instance. ④ Generate a set of shapes alongside their associated distances.
  • Figure 3: Ablation studies showing the comparative performance of our method with different configurations.
  • Figure 4: The heat map illustrates the distribution of the shape weight ($\lambda$) and value weight ($1-\lambda$) for each instance in the AtrialFibrillation dataset.
  • Figure 5: Performance comparison of TST, SVP-T, and VSFormer regarding the accuracy and AUC on the dataset.
  • ...and 4 more figures