VSFormer: Value and Shape-Aware Transformer with Prior-Enhanced Self-Attention for Multivariate Time Series Classification
Wenjie Xi, Rundong Zuo, Alejandro Alvarez, Jie Zhang, Byron Choi, Jessica Lin
TL;DR
VSFormer introduces a value-and-shape-aware Transformer for multivariate time series classification. It jointly processes shape tokens (discriminative patterns) and value tokens (statistical features) with class-specific priors embedded in both encoding and attention via Time Series Information Encoding (TSI) and Prior-Enhanced Self-Attention (PESA). The model achieves state-of-the-art performance across all 30 UEA MTSC datasets and demonstrates strong robustness in datasets lacking clear discriminative patterns, as shown by a solar-flare case study. The approach offers interpretable token-level insights through prioritized shape motifs and interval-based value features, with potential impact on practical MTSC tasks where discriminative patterns are weak or absent.
Abstract
Multivariate time series classification is a crucial task in data mining, attracting growing research interest due to its broad applications. While many existing methods focus on discovering discriminative patterns in time series, real-world data does not always present such patterns, and sometimes raw numerical values can also serve as discriminative features. Additionally, the recent success of Transformer models has inspired many studies. However, when applying to time series classification, the self-attention mechanisms in Transformer models could introduce classification-irrelevant features, thereby compromising accuracy. To address these challenges, we propose a novel method, VSFormer, that incorporates both discriminative patterns (shape) and numerical information (value). In addition, we extract class-specific prior information derived from supervised information to enrich the positional encoding and provide classification-oriented self-attention learning, thereby enhancing its effectiveness. Extensive experiments on all 30 UEA archived datasets demonstrate the superior performance of our method compared to SOTA models. Through ablation studies, we demonstrate the effectiveness of the improved encoding layer and the proposed self-attention mechanism. Finally, We provide a case study on a real-world time series dataset without discriminative patterns to interpret our model.
