Table of Contents
Fetching ...

Efficient Generative Transformer Operators For Million-Point PDEs

Armand Kassaï Koupaï, Lise Le Boudec, Patrick Gallinari

TL;DR

ECHO introduces a scalable transformer-operator for million-point PDE trajectories that combines a hierarchical spatio-temporal encoder, a generative flow-matching transformer, and a three-stage training scheme to enable high-fidelity, long-horizon predictions on irregular meshes. By operating in a compressed latent space and generating full trajectory segments, it mitigates long-range error drift and supports forward, inverse, interpolation, and conditional/unconditional tasks without retraining. Empirical results show state-of-the-art performance across diverse PDEs and geometries, including irregular grids and 3D regimes, while maintaining competitive latency and enabling zero-shot and few-shot adaptation to new parameters. This work advances scalable, multi-task PDE surrogates with robust out-of-distribution generalization and efficient inference for large-scale scientific computing applications.

Abstract

We introduce ECHO, a transformer-operator framework for generating million-point PDE trajectories. While existing neural operators (NOs) have shown promise for solving partial differential equations, they remain limited in practice due to poor scalability on dense grids, error accumulation during dynamic unrolling, and task-specific design. ECHO addresses these challenges through three key innovations. (i) It employs a hierarchical convolutional encode-decode architecture that achieves a 100 $\times$ spatio-temporal compression while preserving fidelity on mesh points. (ii) It incorporates a training and adaptation strategy that enables high-resolution PDE solution generation from sparse input grids. (iii) It adopts a generative modeling paradigm that learns complete trajectory segments, mitigating long-horizon error drift. The training strategy decouples representation learning from downstream task supervision, allowing the model to tackle multiple tasks such as trajectory generation, forward and inverse problems, and interpolation. The generative model further supports both conditional and unconditional generation. We demonstrate state-of-the-art performance on million-point simulations across diverse PDE systems featuring complex geometries, high-frequency dynamics, and long-term horizons.

Efficient Generative Transformer Operators For Million-Point PDEs

TL;DR

ECHO introduces a scalable transformer-operator for million-point PDE trajectories that combines a hierarchical spatio-temporal encoder, a generative flow-matching transformer, and a three-stage training scheme to enable high-fidelity, long-horizon predictions on irregular meshes. By operating in a compressed latent space and generating full trajectory segments, it mitigates long-range error drift and supports forward, inverse, interpolation, and conditional/unconditional tasks without retraining. Empirical results show state-of-the-art performance across diverse PDEs and geometries, including irregular grids and 3D regimes, while maintaining competitive latency and enabling zero-shot and few-shot adaptation to new parameters. This work advances scalable, multi-task PDE surrogates with robust out-of-distribution generalization and efficient inference for large-scale scientific computing applications.

Abstract

We introduce ECHO, a transformer-operator framework for generating million-point PDE trajectories. While existing neural operators (NOs) have shown promise for solving partial differential equations, they remain limited in practice due to poor scalability on dense grids, error accumulation during dynamic unrolling, and task-specific design. ECHO addresses these challenges through three key innovations. (i) It employs a hierarchical convolutional encode-decode architecture that achieves a 100 spatio-temporal compression while preserving fidelity on mesh points. (ii) It incorporates a training and adaptation strategy that enables high-resolution PDE solution generation from sparse input grids. (iii) It adopts a generative modeling paradigm that learns complete trajectory segments, mitigating long-horizon error drift. The training strategy decouples representation learning from downstream task supervision, allowing the model to tackle multiple tasks such as trajectory generation, forward and inverse problems, and interpolation. The generative model further supports both conditional and unconditional generation. We demonstrate state-of-the-art performance on million-point simulations across diverse PDE systems featuring complex geometries, high-frequency dynamics, and long-term horizons.

Paper Structure

This paper contains 106 sections, 20 equations, 25 figures, 21 tables.

Figures (25)

  • Figure 1: Experimental analysis of ECHO. Left: Hierarchical deep, iterative compression improves accuracy, especially at high compression ratios on Vorticity. Middle: ECHO’s full-trajectory generation mitigates error accumulation in long-range Gray-Scott rollouts (40 to 160 is outside of training horizon), outperforming latent autoregressive and deterministic baselines. Right: Generative modeling consistently outperforms deterministic methods on forward, interpolation, and inverse Rayleigh-Bénard tasks. All plots report relative L2 error (lower is better).
  • Figure 2: Architecture of the ECHO framework. ECHO comprises two components: (B) a convolutional auto-encoder and (C) a DiT-based generative process. The auto-encoder uses continuous convolutions to ingest irregular input grids of arbitrary size, map the dynamics to a regular latent grid, and hierarchically compress it; the decoder mirrors this hierarchy and applies a final continuous convolution, enabling queries at arbitrary output locations. The DiT module is trained with a flow-matching objective to denoise latent tokens, optionally conditioned on PDE parameters. This design allows ECHO to handle irregular grids and support multiple inference tasks (A).
  • Figure 3: Three-stage training strategy for million-point trajectory generation. ECHO’s auto-encoder is first trained in two steps: (1) low-resolution trajectory training and (2) high-resolution refinement on single frames. These $2$ steps make use of a reconstruction objective on the input data. The generative process is then trained separately with a flow-matching objective on the encoded tokens (3), while the encoder–decoder is frozen in stage 3.
  • Figure 4: Vorticity analysis: (a) qualitative ground-truth vs ECHO long-range rollouts beyond the training horizon; (b) relative MSE across prediction horizons; (c) in-distribution and OOD generalization.
  • Figure 5: Sample of the Vorticity dataset.
  • ...and 20 more figures