Table of Contents
Fetching ...

TANTE: Time-Adaptive Operator Learning via Neural Taylor Expansion

Zhikai Wu, Sifan Wang, Shiyang Zhang, Sizhuang He, Min Zhu, Anran Jiao, Lu Lu, David van Dijk

TL;DR

This work addresses the challenge of time-dependent PDE operator learning with fixed time steps, which can cause error accumulation and inefficiency during rollout. It introduces TANTE, a Time-Adaptive Transformer that uses Neural Taylor Expansion to predict multiple temporal derivatives and a local convergence radius, enabling continuous-time predictions with adaptive step sizes via a Taylor-series rollout $ ilde{\mathbf{u}}(t)=\mathbf{u}(0)+\sum_{k=1}^{n}{\tilde{\mathbf{u}}^{(k)}(0)\, t^{k}/k!}$ within $[0,\tilde{r}_t]$. The architecture comprises a spatiotemporal encoder, a Transformer Processor that estimates derivatives up to order $n$, and a spatiotemporal decoder that outputs the derivatives and $\tilde{r}_t$, with a regularization term to prevent degenerate radii. Empirically, TANTE achieves state-of-the-art predictive accuracy and efficiency across four challenging PDE benchmarks, demonstrates robust scalability with model size and expansion order, and reveals meaningful adaptivity of the convergence radius across system parameters and trajectories, reducing error accumulation and enabling more efficient simulations. This framework offers a practical, scalable path toward adaptive surrogate models for complex, multi-scale dynamical systems.

Abstract

Operator learning for time-dependent partial differential equations (PDEs) has seen rapid progress in recent years, enabling efficient approximation of complex spatiotemporal dynamics. However, most existing methods rely on fixed time step sizes during rollout, which limits their ability to adapt to varying temporal complexity and often leads to error accumulation. Here, we propose the Time-Adaptive Transformer with Neural Taylor Expansion (TANTE), a novel operator-learning framework that produces continuous-time predictions with adaptive step sizes. TANTE predicts future states by performing a Taylor expansion at the current state, where neural networks learn both the higher-order temporal derivatives and the local radius of convergence. This allows the model to dynamically adjust its rollout based on the local behavior of the solution, thereby reducing cumulative error and improving computational efficiency. We demonstrate the effectiveness of TANTE across a wide range of PDE benchmarks, achieving superior accuracy and adaptability compared to fixed-step baselines, delivering accuracy gains of 60-80 % and speed-ups of 30-40 % at inference time. The code is publicly available at https://github.com/zwu88/TANTE for transparency and reproducibility.

TANTE: Time-Adaptive Operator Learning via Neural Taylor Expansion

TL;DR

This work addresses the challenge of time-dependent PDE operator learning with fixed time steps, which can cause error accumulation and inefficiency during rollout. It introduces TANTE, a Time-Adaptive Transformer that uses Neural Taylor Expansion to predict multiple temporal derivatives and a local convergence radius, enabling continuous-time predictions with adaptive step sizes via a Taylor-series rollout within . The architecture comprises a spatiotemporal encoder, a Transformer Processor that estimates derivatives up to order , and a spatiotemporal decoder that outputs the derivatives and , with a regularization term to prevent degenerate radii. Empirically, TANTE achieves state-of-the-art predictive accuracy and efficiency across four challenging PDE benchmarks, demonstrates robust scalability with model size and expansion order, and reveals meaningful adaptivity of the convergence radius across system parameters and trajectories, reducing error accumulation and enabling more efficient simulations. This framework offers a practical, scalable path toward adaptive surrogate models for complex, multi-scale dynamical systems.

Abstract

Operator learning for time-dependent partial differential equations (PDEs) has seen rapid progress in recent years, enabling efficient approximation of complex spatiotemporal dynamics. However, most existing methods rely on fixed time step sizes during rollout, which limits their ability to adapt to varying temporal complexity and often leads to error accumulation. Here, we propose the Time-Adaptive Transformer with Neural Taylor Expansion (TANTE), a novel operator-learning framework that produces continuous-time predictions with adaptive step sizes. TANTE predicts future states by performing a Taylor expansion at the current state, where neural networks learn both the higher-order temporal derivatives and the local radius of convergence. This allows the model to dynamically adjust its rollout based on the local behavior of the solution, thereby reducing cumulative error and improving computational efficiency. We demonstrate the effectiveness of TANTE across a wide range of PDE benchmarks, achieving superior accuracy and adaptability compared to fixed-step baselines, delivering accuracy gains of 60-80 % and speed-ups of 30-40 % at inference time. The code is publicly available at https://github.com/zwu88/TANTE for transparency and reproducibility.

Paper Structure

This paper contains 53 sections, 26 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: Representative TANTE rollout predictions across four benchmarks. Each benchmark's results are shown in three rows: the first row displays the ground truth field (reference), the second row shows the predictions from TANTE, and the third row illustrates the point-wise absolute error between the predictions and the ground truth. Top Left:$Buoyancy$ field in the Rayleigh-Bénard Convection (RB) benchmark across eight time steps. Top Right:$Velocity$ field (y-direction) in the Active Matter (AM) benchmark across sixteen time steps. Middle Right:$C_{xx}$ field in the Viscoelastic Fluids (VF) benchmark across sixteen time steps. Bottom:$Density$ field in the Turbulent Radiative Layer (TR) benchmark across fourteen time steps.
  • Figure 2: Time-Adaptive Transformer with Neural Taylor Expansion (TANTE). Our framework enables continuous-time prediction with dynamically adjusted step sizes based on the local temporal complexity. TANTE consists of three main components: (a) a spatiotemporal encoder that extracts spatial tokens from input frames and modulates them with temporal information via a FiLM layer; (b) a Transformer Processor that estimates multi-order temporal derivatives at the most recent timestamp $t=0$, with each group of blocks predicting one derivative order; (c) a spatiotemporal decoder that predicts derivatives and infers a confidence interval $[0, \tilde{r}_t]$, defining the time range within which the Taylor expansion is valid. TANTE generates forecasts by summing the predicted derivatives as a Taylor series within the confidence interval. When predictions extend beyond the confidence interval, the model operates autoregressively, incorporating previously predicted states into the next-step input sequence.
  • Figure 3: Predictions of the target field at $t=4$ on the four benchmarks. For each dataset, we show one representative sample comparing our approach with the best performance against several competitive baselines with top accuracy.
  • Figure 4: Relative $L^2$ error at each of eight rollout time points for four PDE benchmarks (TR, AM, VF, and RB). TANTE-2 (blue) and TANTE-1 (red) show the lowest average error across all time steps and the minimum cumulative error compared to the best baseline methods.
  • Figure 5: (a) Transformer block allocations for TANTE variants at three parameter sizes. The Transformer blocks in TANTE are divided into $n$ groups to approximate $n$ orders of derivatives, ensuring comparable model sizes at each level of parameter count. Different colors represent blocks used for estimating derivatives at different orders. (b) Test errors and standard deviations of TANTE variants at three model sizes on the TR benchmark. The prediction accuracy positively correlates with the parameter count for each TANTE variant. Although TANTE-3 shows higher error than TANTE-2 at small and medium sizes, it exhibits better scalability and achieves the lowest error at the large parameter size. (c) Average inference time at eight-step rollout. TANTE-1 and TANTE-2 attain the lowest error (highest accuracy) while remaining as fast as other baselines. Additionally, TANTE-1 and TANTE-2 are faster than TANTE-0.
  • ...and 2 more figures