TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables
Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Guo Qin, Haoran Zhang, Yong Liu, Yunzhong Qiu, Jianmin Wang, Mingsheng Long
TL;DR
TimeXer reframes exogenous-aware time series forecasting by extending a canonical Transformer with separate endogenous and exogenous embeddings and a learnable global endogenous token to bridge external information to endogenous patches. Through patch-level self-attention for endogenous history and variate-level cross-attention for exogenous signals, TimeXer achieves state-of-the-art results across twelve real-world datasets and demonstrates robustness to irregular exogenous data and large-scale settings. The approach maintains efficiency by avoiding costly interactions among exogenous tokens and supports parallel multivariate forecasting, with interpretable attention patterns linking exogenous factors to endogenous dynamics. Overall, TimeXer offers a practical, generalizable framework for leveraging external information in time series forecasting without architectural changes to the Transformer backbone.
Abstract
Deep models have demonstrated remarkable performance in time series forecasting. However, due to the partially-observed nature of real-world applications, solely focusing on the target of interest, so-called endogenous variables, is usually insufficient to guarantee accurate forecasting. Notably, a system is often recorded into multiple variables, where the exogenous variables can provide valuable external information for endogenous variables. Thus, unlike well-established multivariate or univariate forecasting paradigms that either treat all the variables equally or ignore exogenous information, this paper focuses on a more practical setting: time series forecasting with exogenous variables. We propose a novel approach, TimeXer, to ingest external information to enhance the forecasting of endogenous variables. With deftly designed embedding layers, TimeXer empowers the canonical Transformer with the ability to reconcile endogenous and exogenous information, where patch-wise self-attention and variate-wise cross-attention are used simultaneously. Moreover, global endogenous tokens are learned to effectively bridge the causal information underlying exogenous series into endogenous temporal patches. Experimentally, TimeXer achieves consistent state-of-the-art performance on twelve real-world forecasting benchmarks and exhibits notable generality and scalability. Code is available at this repository: https://github.com/thuml/TimeXer.
