Sequence Complementor: Complementing Transformers For Time Series Forecasting with Learnable Sequences
Xiwen Chen, Peijie Qiu, Wenhui Zhu, Huayu Li, Hao Wang, Aristeidis Sotiras, Yalin Wang, Abolfazl Razi
TL;DR
This work identifies representation richness as a key limiter for Transformer-based time series forecasting. It introduces Sequence Complementor, a learnable set of complementary sequences appended to the input to expand feature diversity, supported by information-theoretic arguments that higher encoder entropy can lower the MMSE bound. A differentiable diversification loss further encourages complementary sequences to remain distinct and informative. Empirically, the method achieves state-of-the-art results on both long-term and short-term forecasting across multiple datasets and is model-agnostic, as demonstrated by improvements to iTransformer and consistent ablations showing the benefits of three complementors and diversification.
Abstract
Since its introduction, the transformer has shifted the development trajectory away from traditional models (e.g., RNN, MLP) in time series forecasting, which is attributed to its ability to capture global dependencies within temporal tokens. Follow-up studies have largely involved altering the tokenization and self-attention modules to better adapt Transformers for addressing special challenges like non-stationarity, channel-wise dependency, and variable correlation in time series. However, we found that the expressive capability of sequence representation is a key factor influencing Transformer performance in time forecasting after investigating several representative methods, where there is an almost linear relationship between sequence representation entropy and mean square error, with more diverse representations performing better. In this paper, we propose a novel attention mechanism with Sequence Complementors and prove feasible from an information theory perspective, where these learnable sequences are able to provide complementary information beyond current input to feed attention. We further enhance the Sequence Complementors via a diversification loss that is theoretically covered. The empirical evaluation of both long-term and short-term forecasting has confirmed its superiority over the recent state-of-the-art methods.
