BasisFormer: Attention-based Time Series Forecasting with Learnable and Interpretable Basis

Zelin Ni; Hang Yu; Shizhan Liu; Jianguo Li; Weiyao Lin

BasisFormer: Attention-based Time Series Forecasting with Learnable and Interpretable Basis

Zelin Ni, Hang Yu, Shizhan Liu, Jianguo Li, Weiyao Lin

TL;DR

BasisFormer addresses the challenge of learning a time-series basis that is both data-adaptive and individually correlated with each series. It introduces a three-part architecture: a self-supervised Basis module to learn a consistent, interpretable basis across historical and future views; a Coef module based on bidirectional cross-attention to compute per-series basis coefficients; and a Forecast module that aggregates future-basis vectors using these coefficients to predict outcomes. The model optimizes a joint loss $L = L_{pred} + L_{align} + L_{smooth}$, with $L_{align}$ derived from InfoNCE; this yields improved forecasting performance across six datasets, outperforming state-of-the-art methods by substantial margins. The approach also demonstrates robustness to hyperparameters and offers millisecond-level inference efficiency, highlighting its practical applicability for real-time, multi-series forecasting.

Abstract

Bases have become an integral part of modern deep learning-based models for time series forecasting due to their ability to act as feature extractors or future references. To be effective, a basis must be tailored to the specific set of time series data and exhibit distinct correlation with each time series within the set. However, current state-of-the-art methods are limited in their ability to satisfy both of these requirements simultaneously. To address this challenge, we propose BasisFormer, an end-to-end time series forecasting architecture that leverages learnable and interpretable bases. This architecture comprises three components: First, we acquire bases through adaptive self-supervised learning, which treats the historical and future sections of the time series as two distinct views and employs contrastive learning. Next, we design a Coef module that calculates the similarity coefficients between the time series and bases in the historical view via bidirectional cross-attention. Finally, we present a Forecast module that selects and consolidates the bases in the future view based on the similarity coefficients, resulting in accurate future predictions. Through extensive experiments on six datasets, we demonstrate that BasisFormer outperforms previous state-of-the-art methods by 11.04\% and 15.78\% respectively for univariate and multivariate forecasting tasks. Code is available at: \url{https://github.com/nzl5116190/Basisformer}

BasisFormer: Attention-based Time Series Forecasting with Learnable and Interpretable Basis

TL;DR

, with

derived from InfoNCE; this yields improved forecasting performance across six datasets, outperforming state-of-the-art methods by substantial margins. The approach also demonstrates robustness to hyperparameters and offers millisecond-level inference efficiency, highlighting its practical applicability for real-time, multi-series forecasting.

Abstract

Paper Structure (25 sections, 8 equations, 6 figures, 15 tables)

This paper contains 25 sections, 8 equations, 6 figures, 15 tables.

Introduction
Related works
BasisFormer
Coef module for similarity comparison between time series and basis
Forecast module for aggregation and future prediction
Basis module for basis learning
Experiments
Main results
Ablation studies
Other studies
Conclusion
Acknowledgements
Additional Experiments
Experiments on the ETT datasets
Experimental results with longer length input setting
...and 10 more sections

Figures (6)

Figure 1: The architecture of BasisFormer, consisting of the Coef module, the Forcast module, and the Basis module. The green and blue lines denote the data flow of the set of time series and basis vector repespetively. The cyan diamond denotes tensor dot product. Note that the dot-dash line, which denotes the data flow of the future part of the time series, is only included during training but removed during inference.
Figure 2: Two highly correlated basis vectors when the number of basis vectors $N$ is large.
Figure 3: The visualization of time series and learned basis on the Traffic dataset: The solid line indicates the historical series and the dashed line indicates the future series. For this visualization, we set the input length $I$ to 96 and the output length $O$ to 96.
Figure 4: MSE for the testing data as a function of the weight for the smoothness (the red line) and the infoNCE loss(the blue line).
Figure 5: corresponding set of attention maps from past and future perspectives
...and 1 more figures

BasisFormer: Attention-based Time Series Forecasting with Learnable and Interpretable Basis

TL;DR

Abstract

BasisFormer: Attention-based Time Series Forecasting with Learnable and Interpretable Basis

Authors

TL;DR

Abstract

Table of Contents

Figures (6)