Table of Contents
Fetching ...

EMAformer: Enhancing Transformer through Embedding Armor for Time Series Forecasting

Zhiwei Zhang, Xinyi Du, Xuanchi Guo, Weihao Wang, Wenjuan Han

TL;DR

The paper addresses unstable inter-channel dynamics in multivariate time series forecasting with Transformers. It proposes EMAformer, which injects three inductive biases via channel, phase, and joint channel-phase embeddings into variate-tokenized inputs, preserving the backbone architecture. Empirically, EMAformer achieves state-of-the-art results on 12 real-world benchmarks, with average improvements of 2.73% in MSE and 5.15% in MAE, notably excelling on high-channel datasets and challenging PEMS tasks. This approach offers a practical, architecture-agnostic path to enhance Transformer-based time series forecasting, with publicly available code for reproducibility.

Abstract

Multivariate time series forecasting is crucial across a wide range of domains. While presenting notable progress for the Transformer architecture, iTransformer still lags behind the latest MLP-based models. We attribute this performance gap to unstable inter-channel relationships. To bridge this gap, we propose EMAformer, a simple yet effective model that enhances the Transformer with an auxiliary embedding suite, akin to armor that reinforces its ability. By introducing three key inductive biases, i.e., \textit{global stability}, \textit{phase sensitivity}, and \textit{cross-axis specificity}, EMAformer unlocks the further potential of the Transformer architecture, achieving state-of-the-art performance on 12 real-world benchmarks and reducing forecasting errors by an average of 2.73\% in MSE and 5.15\% in MAE. This significantly advances the practical applicability of Transformer-based approaches for multivariate time series forecasting. The code is available on https://github.com/PlanckChang/EMAformer.

EMAformer: Enhancing Transformer through Embedding Armor for Time Series Forecasting

TL;DR

The paper addresses unstable inter-channel dynamics in multivariate time series forecasting with Transformers. It proposes EMAformer, which injects three inductive biases via channel, phase, and joint channel-phase embeddings into variate-tokenized inputs, preserving the backbone architecture. Empirically, EMAformer achieves state-of-the-art results on 12 real-world benchmarks, with average improvements of 2.73% in MSE and 5.15% in MAE, notably excelling on high-channel datasets and challenging PEMS tasks. This approach offers a practical, architecture-agnostic path to enhance Transformer-based time series forecasting, with publicly available code for reproducibility.

Abstract

Multivariate time series forecasting is crucial across a wide range of domains. While presenting notable progress for the Transformer architecture, iTransformer still lags behind the latest MLP-based models. We attribute this performance gap to unstable inter-channel relationships. To bridge this gap, we propose EMAformer, a simple yet effective model that enhances the Transformer with an auxiliary embedding suite, akin to armor that reinforces its ability. By introducing three key inductive biases, i.e., \textit{global stability}, \textit{phase sensitivity}, and \textit{cross-axis specificity}, EMAformer unlocks the further potential of the Transformer architecture, achieving state-of-the-art performance on 12 real-world benchmarks and reducing forecasting errors by an average of 2.73\% in MSE and 5.15\% in MAE. This significantly advances the practical applicability of Transformer-based approaches for multivariate time series forecasting. The code is available on https://github.com/PlanckChang/EMAformer.

Paper Structure

This paper contains 37 sections, 16 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Coefficients of variation (CoV) of the inter-channel correlations. We compute CoV among channels by first measuring the correlations within each day and then computing the mean and standard deviation across days. A CoV value greater than 1 signals substantial variability, indicating that local inter-channel relationships are unstable. These findings imply that vanilla self-attention mechanisms struggle with such rapidly changing dynamics.
  • Figure 2: Overview of EMAformer. We enhance a Transformer within variate tokenization framework by integrating three types of auxiliary embeddings: (1) channel embeddings to capture the global representation and stabilize local inter-channel relations; (2) phase embeddings to restore the temporal detail and enhance phase sensitivity; and (3) joint channel-phase embeddings to capture intricate dependencies across channel and temporal dimensions. These are combined with variate token embeddings and processed by the Transformer encoder, effectively augmenting its capacity without modifying its core architecture. The notation % denotes the modulo operator.
  • Figure 3: Ablation study. The legends omit the word embedding for brevity. Our complete model achieves the best overall performance.
  • Figure 4: Comparison of our embedding strategy by replacing the Transformer backbone with MLP, against the strongest existing MLP baselines. Our MLP variant outperforms other MLP baselines across almost all datasets, underscoring the effectiveness of our embedding design.
  • Figure 5: Replacing the input series with its mean values to eliminate the history information. Our model still matches or outperforms some baselines.
  • ...and 4 more figures