DeformTime: Capturing Variable Dependencies with Deformable Attention for Time Series Forecasting

Yuxuan Shu; Vasileios Lampos

DeformTime: Capturing Variable Dependencies with Deformable Attention for Time Series Forecasting

Yuxuan Shu, Vasileios Lampos

TL;DR

DeformTime addresses the challenge of leveraging exogenous predictors in multivariate time series forecasting by introducing deformable attention blocks that learn inter-variable and intra-variable dependencies. The model employs a variable deformable attention block (V-DAB) and a temporal deformable attention block (T-DAB) within a dual-branch encoder, augmented by a neighbourhood-aware input embedding (NAE) and positional encodings, followed by a GRU-based decoder. Across six real-world datasets, including infectious disease applications with many exogenous predictors, DeformTime achieves an average MAE reduction of 7.2% compared with strong baselines, with larger gains in disease forecasting tasks and longer horizons. The method demonstrates robust performance, scalable memory usage, and clear ablation-supported benefits from modeling variable interactions and multi-granularity temporal patterns, suggesting practical value for real-time MTS monitoring and forecasting.

Abstract

In multivariable time series (MTS) forecasting, existing state-of-the-art deep learning approaches tend to focus on autoregressive formulations and often overlook the potential of using exogenous variables in enhancing the prediction of the target endogenous variable. To address this limitation, we present DeformTime, a neural network architecture that attempts to capture correlated temporal patterns from the input space, and hence, improve forecasting accuracy. It deploys two core operations performed by deformable attention blocks (DABs): learning dependencies across variables from different time steps (variable DAB), and preserving temporal dependencies in data from previous time steps (temporal DAB). Input data transformation is explicitly designed to enhance learning from the deformed series of information while passing through a DAB. We conduct extensive experiments on 6 MTS data sets, using previously established benchmarks as well as challenging infectious disease modelling tasks with more exogenous variables. The results demonstrate that DeformTime improves accuracy against previous competitive methods across the vast majority of MTS forecasting tasks, reducing the mean absolute error by 7.2% on average. Notably, performance gains remain consistent across longer forecasting horizons.

DeformTime: Capturing Variable Dependencies with Deformable Attention for Time Series Forecasting

TL;DR

Abstract

Paper Structure (51 sections, 16 equations, 16 figures, 9 tables)

This paper contains 51 sections, 16 equations, 16 figures, 9 tables.

Introduction
MTS forecasting task definition
Time series forecasting with DeformTime
Multi-head attention
Neighbourhood-aware input embedding (NAE)
Variable deformable attention block (V-DAB)
Temporal deformable attention block (T-DAB)
Encoder
Encoder-decoder structure
Results
Experimental settings
Forecasting tasks and baseline methods.
DeformTime's setup.
Hyperparameters specific to forecasting tasks and optimisation settings.
Forecasting accuracy
...and 36 more sections

Figures (16)

Figure 1: The architecture of DeformTime. We use the notation introduced in s \ref{['sec:task']} and \ref{['sec:deformtime']}. DeformTime's core modules comprise two deformable attention blocks (DABs), a variable DAB (V-DAB) and a temporal DAB (T-DAB) that respectively capture inter- and intra-variable dependencies. Both DABs reside in the Encoder module. We deploy a 2-layer GRU as the Decoder. Finally, we have visualised key data operations (Segment and Adapt blocks) that take place in the DABs.
Figure 2: 28 days ahead forecasting results for influenza season 2018/19 in England (ILI-ENG) for all models. The black line denotes the ground truth, i.e. the reported (actual) ILI rates.
Figure 3: GPU memory (VRAM) consumption based on (A) the length (time steps) of the look-back window ($L$), and (B) the number of input variables ($C\!+\!1$).
Figure S1: An example of how the training, validation, and test sets are constructed when the test influenza season is 2018/19 (England). The lines in blue, red, and orange colour denote the training, validation, and test periods, respectively. To form the validation set from our training data, we select the period after the peak (outset) from the third to last influenza season, the period around the peak from the penultimate season, and a period around influenza onset from the last season.
Figure S2: 7 days ahead forecasts for all influenza seasons and models for England (ILI-ENG).
...and 11 more figures

DeformTime: Capturing Variable Dependencies with Deformable Attention for Time Series Forecasting

TL;DR

Abstract

DeformTime: Capturing Variable Dependencies with Deformable Attention for Time Series Forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (16)