Table of Contents
Fetching ...

Sparse-VQ Transformer: An FFN-Free Framework with Vector Quantization for Enhanced Time Series Forecasting

Yanjun Zhao, Tian Zhou, Chao Chen, Liang Sun, Yi Qian, Rong Jin

TL;DR

Sparse-VQ introduces an FFN-free Transformer for time series forecasting that combines Reverse Instance Normalization (RevIN) with a sparse vector quantization (SVQ) module to robustly capture statistics under non-stationary distributions and noise. By reconstructing inputs as sparse combinations of learned codebook vectors, the model reduces parameter count by about 21% and eliminates the traditional Feed-Forward Network, while maintaining or surpassing state-of-the-art accuracy across ten benchmarks including CAISO. Empirical results show 7.84% and 4.17% MAE improvements for univariate and multivariate forecasting respectively, and the approach provides a flexible plug-in to boost other Transformer backbones like FEDformer and Autoformer. The work highlights the importance of component-wise analysis of transformers in time series and offers a practical, efficient alternative tailored to drifting statistics and noisy data.

Abstract

Time series analysis is vital for numerous applications, and transformers have become increasingly prominent in this domain. Leading methods customize the transformer architecture from NLP and CV, utilizing a patching technique to convert continuous signals into segments. Yet, time series data are uniquely challenging due to significant distribution shifts and intrinsic noise levels. To address these two challenges,we introduce the Sparse Vector Quantized FFN-Free Transformer (Sparse-VQ). Our methodology capitalizes on a sparse vector quantization technique coupled with Reverse Instance Normalization (RevIN) to reduce noise impact and capture sufficient statistics for forecasting, serving as an alternative to the Feed-Forward layer (FFN) in the transformer architecture. Our FFN-free approach trims the parameter count, enhancing computational efficiency and reducing overfitting. Through evaluations across ten benchmark datasets, including the newly introduced CAISO dataset, Sparse-VQ surpasses leading models with a 7.84% and 4.17% decrease in MAE for univariate and multivariate time series forecasting, respectively. Moreover, it can be seamlessly integrated with existing transformer-based models to elevate their performance.

Sparse-VQ Transformer: An FFN-Free Framework with Vector Quantization for Enhanced Time Series Forecasting

TL;DR

Sparse-VQ introduces an FFN-free Transformer for time series forecasting that combines Reverse Instance Normalization (RevIN) with a sparse vector quantization (SVQ) module to robustly capture statistics under non-stationary distributions and noise. By reconstructing inputs as sparse combinations of learned codebook vectors, the model reduces parameter count by about 21% and eliminates the traditional Feed-Forward Network, while maintaining or surpassing state-of-the-art accuracy across ten benchmarks including CAISO. Empirical results show 7.84% and 4.17% MAE improvements for univariate and multivariate forecasting respectively, and the approach provides a flexible plug-in to boost other Transformer backbones like FEDformer and Autoformer. The work highlights the importance of component-wise analysis of transformers in time series and offers a practical, efficient alternative tailored to drifting statistics and noisy data.

Abstract

Time series analysis is vital for numerous applications, and transformers have become increasingly prominent in this domain. Leading methods customize the transformer architecture from NLP and CV, utilizing a patching technique to convert continuous signals into segments. Yet, time series data are uniquely challenging due to significant distribution shifts and intrinsic noise levels. To address these two challenges,we introduce the Sparse Vector Quantized FFN-Free Transformer (Sparse-VQ). Our methodology capitalizes on a sparse vector quantization technique coupled with Reverse Instance Normalization (RevIN) to reduce noise impact and capture sufficient statistics for forecasting, serving as an alternative to the Feed-Forward layer (FFN) in the transformer architecture. Our FFN-free approach trims the parameter count, enhancing computational efficiency and reducing overfitting. Through evaluations across ten benchmark datasets, including the newly introduced CAISO dataset, Sparse-VQ surpasses leading models with a 7.84% and 4.17% decrease in MAE for univariate and multivariate time series forecasting, respectively. Moreover, it can be seamlessly integrated with existing transformer-based models to elevate their performance.
Paper Structure (37 sections, 1 theorem, 5 equations, 9 figures, 24 tables, 1 algorithm)

This paper contains 37 sections, 1 theorem, 5 equations, 9 figures, 24 tables, 1 algorithm.

Key Result

proposition 1

For a cluster-based scheme, $N(\mathcal{U}, \epsilon)$ is no less than $1/\epsilon^n$, whereas for the sparse regression technique, $N(\mathcal{U},\epsilon)$ has an upper bound of $(4n/\epsilon)^q$, where provided that the count of non-zero coefficients used in sparse regression is at least

Figures (9)

  • Figure 1: Although the temporal covariate shift problem exists in non-stationary time series, as shown in (a), historical patterns may still reoccur in the future. For example, the distribution of $L_{2}$ (b) are similar to that of $L_{5}$ (c).
  • Figure 2: Sparse-VQ Block(SVQ).
  • Figure 3: Classic model with incorporation of VQ(left) and Sparse-VQ Model Overview(right).
  • Figure 4: Distribution of embedding weight. Sparse-VQ framework encourages denser weights
  • Figure 5: Distribution of codebook. Sparse-VQ encourage a sparser codebook with a wider range of perception.
  • ...and 4 more figures

Theorems & Definitions (1)

  • proposition 1