Time-Delayed Transformers for Data-Driven Modeling of Low-Dimensional Dynamics
Albert Alcalde, Markus Widhalm, Emre Yılmaz
TL;DR
The paper tackles learning time-dependent dynamics from partial observations by bridging linear delay-based models and nonlinear transformer architectures. It introduces the time-delayed transformer (TD-TF), a minimal, single-layer, single-head attention model with a feedforward map that operates on a finite history and uses time-indexed positional encoding, yielding linear complexity in sequence length while enabling nonlinear expressivity. TD-TF is shown to be a nonlinear generalization of time-delayed dynamic mode decomposition (TD-DMD), with training that enforces a residual learning form and autoregressive rollout for long-horizon predictions. Across four case studies—from a sinusoidal signal and unsteady aerodynamic flow to the chaotic Lorenz system and a reaction–diffusion PDE—TD-TF matches linear baselines in near-linear regimes and substantially outperforms them in nonlinear/chaotic regimes, preserving interpretability while delivering improved predictive performance. This approach offers a tractable, transparent path toward trustworthy data-driven modeling of complex dynamical systems, with potential for extension to higher-dimensional and physics-constrained settings.
Abstract
We propose the time-delayed transformer (TD-TF), a simplified transformer architecture for data-driven modeling of unsteady spatio-temporal dynamics. TD-TF bridges linear operator-based methods and deep sequence models by showing that a single-layer, single-head transformer can be interpreted as a nonlinear generalization of time-delayed dynamic mode decomposition (TD-DMD). The architecture is deliberately minimal, consisting of one self-attention layer with a single query per prediction and one feedforward layer, resulting in linear computational complexity in sequence length and a small parameter count. Numerical experiments demonstrate that TD-TF matches the performance of strong linear baselines on near-linear systems, while significantly outperforming them in nonlinear and chaotic regimes, where it accurately captures long-term dynamics. Validation studies on synthetic signals, unsteady aerodynamics, the Lorenz '63 system, and a reaction-diffusion model show that TD-TF preserves the interpretability and efficiency of linear models while providing substantially enhanced expressive power for complex dynamics.
