Sequence Complementor: Complementing Transformers For Time Series Forecasting with Learnable Sequences

Xiwen Chen; Peijie Qiu; Wenhui Zhu; Huayu Li; Hao Wang; Aristeidis Sotiras; Yalin Wang; Abolfazl Razi

Sequence Complementor: Complementing Transformers For Time Series Forecasting with Learnable Sequences

Xiwen Chen, Peijie Qiu, Wenhui Zhu, Huayu Li, Hao Wang, Aristeidis Sotiras, Yalin Wang, Abolfazl Razi

TL;DR

This work identifies representation richness as a key limiter for Transformer-based time series forecasting. It introduces Sequence Complementor, a learnable set of complementary sequences appended to the input to expand feature diversity, supported by information-theoretic arguments that higher encoder entropy can lower the MMSE bound. A differentiable diversification loss further encourages complementary sequences to remain distinct and informative. Empirically, the method achieves state-of-the-art results on both long-term and short-term forecasting across multiple datasets and is model-agnostic, as demonstrated by improvements to iTransformer and consistent ablations showing the benefits of three complementors and diversification.

Abstract

Since its introduction, the transformer has shifted the development trajectory away from traditional models (e.g., RNN, MLP) in time series forecasting, which is attributed to its ability to capture global dependencies within temporal tokens. Follow-up studies have largely involved altering the tokenization and self-attention modules to better adapt Transformers for addressing special challenges like non-stationarity, channel-wise dependency, and variable correlation in time series. However, we found that the expressive capability of sequence representation is a key factor influencing Transformer performance in time forecasting after investigating several representative methods, where there is an almost linear relationship between sequence representation entropy and mean square error, with more diverse representations performing better. In this paper, we propose a novel attention mechanism with Sequence Complementors and prove feasible from an information theory perspective, where these learnable sequences are able to provide complementary information beyond current input to feed attention. We further enhance the Sequence Complementors via a diversification loss that is theoretically covered. The empirical evaluation of both long-term and short-term forecasting has confirmed its superiority over the recent state-of-the-art methods.

Sequence Complementor: Complementing Transformers For Time Series Forecasting with Learnable Sequences

TL;DR

Abstract

Paper Structure (47 sections, 6 theorems, 27 equations, 15 figures, 9 tables, 2 algorithms)

This paper contains 47 sections, 6 theorems, 27 equations, 15 figures, 9 tables, 2 algorithms.

Introduction
Preliminaries
The Analysis of Transformers in Time Series Forecasting
Sequence Complementor
Learnable Complementary Sequence
Complexity analysis.
Theoretical Justification
Justification 1.
Justification 2.
Diversified Complementary Sequence
Learning objective.
Experiments and Results
Experimental setup.
Main Results on Long-term Forecasting.
Main Results on Short-term Forecasting.
...and 32 more sections

Key Result

lemma 1

Under Gaussian assumption, the minimum mean-squared error (MMSE), is bounded by, Here, $H(\cdot|\cdot)$ denotes the conditional entropy.

Figures (15)

Figure 1: The vanilla self-attention mechanism v.s. the self-attention mechanism with proposed learnable Sequence Complementors, which serve as complementary sequences to the original input sequence (Left). The integration of Sequence Complementors results in richer learned representations and a better forecasting performance (Right).
Figure 2: The analysis of transformers for time series forecasting: (a) the correlation of the learned latent representations from the encoder ($\boldsymbol{Z}_{enc}$), (b) the ratio of dominant singular value against MSE, and (c) the feature entropy against MSE.
Figure 3: The qualitative results on ETTh2 Dataset.
Figure 4: Ablation studies on the number of learnable Sequence Complementors and the diversified Sequence Complementors on different datasets. The results suggest that when the number of complementors is equal to 3, the overall performance is desired.
Figure 5: The comparison of training dynamics with and without Sequence Complementors.
...and 10 more figures

Theorems & Definitions (18)

Definition 1
Remark 1
proof
lemma 1
proof
Remark 2
theorem 1
proof
Definition 2
theorem 2
...and 8 more

Sequence Complementor: Complementing Transformers For Time Series Forecasting with Learnable Sequences

TL;DR

Abstract

Sequence Complementor: Complementing Transformers For Time Series Forecasting with Learnable Sequences

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (18)