TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables

Yuxuan Wang; Haixu Wu; Jiaxiang Dong; Guo Qin; Haoran Zhang; Yong Liu; Yunzhong Qiu; Jianmin Wang; Mingsheng Long

TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables

Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Guo Qin, Haoran Zhang, Yong Liu, Yunzhong Qiu, Jianmin Wang, Mingsheng Long

TL;DR

TimeXer reframes exogenous-aware time series forecasting by extending a canonical Transformer with separate endogenous and exogenous embeddings and a learnable global endogenous token to bridge external information to endogenous patches. Through patch-level self-attention for endogenous history and variate-level cross-attention for exogenous signals, TimeXer achieves state-of-the-art results across twelve real-world datasets and demonstrates robustness to irregular exogenous data and large-scale settings. The approach maintains efficiency by avoiding costly interactions among exogenous tokens and supports parallel multivariate forecasting, with interpretable attention patterns linking exogenous factors to endogenous dynamics. Overall, TimeXer offers a practical, generalizable framework for leveraging external information in time series forecasting without architectural changes to the Transformer backbone.

Abstract

Deep models have demonstrated remarkable performance in time series forecasting. However, due to the partially-observed nature of real-world applications, solely focusing on the target of interest, so-called endogenous variables, is usually insufficient to guarantee accurate forecasting. Notably, a system is often recorded into multiple variables, where the exogenous variables can provide valuable external information for endogenous variables. Thus, unlike well-established multivariate or univariate forecasting paradigms that either treat all the variables equally or ignore exogenous information, this paper focuses on a more practical setting: time series forecasting with exogenous variables. We propose a novel approach, TimeXer, to ingest external information to enhance the forecasting of endogenous variables. With deftly designed embedding layers, TimeXer empowers the canonical Transformer with the ability to reconcile endogenous and exogenous information, where patch-wise self-attention and variate-wise cross-attention are used simultaneously. Moreover, global endogenous tokens are learned to effectively bridge the causal information underlying exogenous series into endogenous temporal patches. Experimentally, TimeXer achieves consistent state-of-the-art performance on twelve real-world forecasting benchmarks and exhibits notable generality and scalability. Code is available at this repository: https://github.com/thuml/TimeXer.

TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables

TL;DR

Abstract

Paper Structure (48 sections, 8 equations, 14 figures, 12 tables)

This paper contains 48 sections, 8 equations, 14 figures, 12 tables.

Introduction
Related Work
Transformer-based Time Series Forecaster
Forecasting with Exogenous Variables
TimeXer
Problem Settings
Structure Overview
Endogenous Embedding
Exogenous Embedding
Endogenous Self-Attention
Exogenous-to-Endogenous Cross-Attention
Forecasting Loss
Parallel Multivariate Forecasting
Experiments
Datasets
...and 33 more sections

Figures (14)

Figure 1: Left: The forecasting with exogenous variables paradigm includes inputs from multiple external variables as auxiliary information without the need for forecasting. Right: Model performance comparison on existing electricity price forecasting with exogenous variables benchmarks.
Figure 2: The schematic of TimeXer, which empowers time series forecasting with exogenous variables. (a) The endogenous embedding module yields multiple temporal token embeddings and one global token embedding for the endogenous variable. (b) The exogenous embedding module yields a variate token embedding for each exogenous variable. (c) Self-attention is applied simultaneously over the endogenous temporal tokens and the global token to capture patch-wise dependencies. (d) Cross-attention is applied over endogenous and exogenous variables to integrate external information.
Figure 3: Performance with the enlarged look-back length varying from $\{96, 192, 336, 512, 720\}$. Different styles of lines represent different prediction lengths. In most cases, the forecasting performance benefits from enlarged look-back lengths of both endogenous and exogenous series.
Figure 4: Forecasting performance on large-scale time series datasets. Left: Illustration of the forecasting scenario. The endogenous is the temperature collected from weather stations, and the exogenous variables are meteorological indicators from the surrounding 3x3 grids including the weather station. Each area contains four types of information, namely, temperature, pressure, u- and v- components of wind. Right: TimeXer outperforms other advanced forecasters.
Figure 5: Model analysis of TimeXer. Left: Visualization of learned attention map and the endogenous time series and exogenous time series with highest and lowest attention scores. Right: Model efficiency comparison under the forecasting with exogenous variables paradigm on the ECL dataset.
...and 9 more figures

TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables

TL;DR

Abstract

TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables

Authors

TL;DR

Abstract

Table of Contents

Figures (14)