From Complex Dynamics to DynFormer: Rethinking Transformers for PDEs

Pengyu Lai; Yixiao Chen; Dewu Yang; Rui Wang; Feng Wang; Hui Xu

From Complex Dynamics to DynFormer: Rethinking Transformers for PDEs

Pengyu Lai, Yixiao Chen, Dewu Yang, Rui Wang, Feng Wang, Hui Xu

TL;DR

DynFormer is proposed, a novel dynamics-informed neural operator that embedding first-principles physical dynamics into Transformer architectures yields a highly scalable, theoretically grounded blueprint for PDE surrogate modeling.

Abstract

Partial differential equations (PDEs) are fundamental for modeling complex physical systems, yet classical numerical solvers face prohibitive computational costs in high-dimensional and multi-scale regimes. While Transformer-based neural operators have emerged as powerful data-driven alternatives, they conventionally treat all discretized spatial points as uniform, independent tokens. This monolithic approach ignores the intrinsic scale separation of physical fields, applying computationally prohibitive global attention that redundantly mixes smooth large-scale dynamics with high-frequency fluctuations. Rethinking Transformers through the lens of complex dynamics, we propose DynFormer, a novel dynamics-informed neural operator. Rather than applying a uniform attention mechanism across all scales, DynFormer explicitly assigns specialized network modules to distinct physical scales. It leverages a Spectral Embedding to isolate low-frequency modes, enabling a Kronecker-structured attention mechanism to efficiently capture large-scale global interactions with reduced complexity. Concurrently, we introduce a Local-Global-Mixing transformation. This module utilizes nonlinear multiplicative frequency mixing to implicitly reconstruct the small-scale, fast-varying turbulent cascades that are slaved to the macroscopic state, without incurring the cost of global attention. Integrating these modules into a hybrid evolutionary architecture ensures robust long-term temporal stability. Extensive memory-aligned evaluations across four PDE benchmarks demonstrate that DynFormer achieves up to a 95% reduction in relative error compared to state-of-the-art baselines, while significantly reducing GPU memory consumption. Our results establish that embedding first-principles physical dynamics into Transformer architectures yields a highly scalable, theoretically grounded blueprint for PDE surrogate modeling.

From Complex Dynamics to DynFormer: Rethinking Transformers for PDEs

TL;DR

Abstract

Paper Structure (37 sections, 21 equations, 5 figures, 7 tables)

This paper contains 37 sections, 21 equations, 5 figures, 7 tables.

Introduction
Methodology
Scale Decomposition of Complex Dynamics
Large-Scale and Small-Scale Dynamics
Mapping from Large- to Small-Scale Dynamics
Scale decomposition via Fourier Modes
Modeling Large-Scale Interaction via Kronecker-Structured Attention
Attention as a Learnable Kernel Operator
Spectral Dynamics Embedding
Kronecker-Structured Attention Mechanism
Recovering Full-Scale Dynamics via Mixing Transformations
Local-Global-Mixing Transformation
Integration into Full-Scale Dynamics Layers (FSDL)
Evolutionary Operator with DynFormer
Full-Scale Dynamics Layer (FSDL) Internal Routing
...and 22 more sections

Figures (5)

Figure 1: Overview of DynFormer's performance across four PDE benchmarks and baselines' peak GPU memory consumption.
Figure 2: Illustration of the DynFormer architecture.
Figure 3: Performance comparison of neural operator architectures across diverse PDE benchmarks. (a) Aggregated Log-Min-Max normalized scores (0--100) averaged over all benchmarks, model variants (Tiny/Medium/Large), and random seeds, highlighting DynFormer's substantial lead over state-of-the-art baselines. (b) Per-benchmark breakdown showing DynFormer's dominance across 1D Kuramoto-Sivashinsky, 2D Darcy, 2D Navier-Stokes, and 3D Shallow Water equations.
Figure 4: Qualitative visualization and error analysis on the 2D Navier-Stokes benchmark. Top row: Predicted vorticity fields compared against Ground Truth. DynFormer captures fine-scale turbulent structures with high fidelity, whereas baselines exhibit smoothing or artifacts. Bottom left: Error evolution over simulation timesteps, demonstrating DynFormer's stability. Bottom right: Mean Squared Error (MSE) comparison, where DynFormer achieves an order-of-magnitude reduction compared to competing methods.
Figure 5: Computational efficiency and memory--performance trade-offs. (a) Peak GPU memory consumption (MB) for Tiny, Medium, and Large variants. All baselines are built with comparable consumptions. (b) Score per Memory ratio, illustrating DynFormer's superior efficiency by reaching 9.6. (a) Scalability analysis across Tiny, Medium, and Large model variants. DynFormer consistently achieves higher performance scores for equivalent or lower memory usage, with its Tiny variant outperforming the Large variants of several baselines.

From Complex Dynamics to DynFormer: Rethinking Transformers for PDEs

TL;DR

Abstract

From Complex Dynamics to DynFormer: Rethinking Transformers for PDEs

Authors

TL;DR

Abstract

Table of Contents

Figures (5)