Table of Contents
Fetching ...

P3D: Scalable Neural Surrogates for High-Resolution 3D Physics Simulations with Global Context

Benjamin Holzschuh, Georg Kohl, Florian Redinger, Nils Thuerey

TL;DR

Presents P3D, a scalable surrogate framework for high-resolution 3D PDEs that blends CNNs and Transformers to enable fast, accurate autoregressive predictions and probabilistic sampling. The key ideas include a hybrid encoder–decoder with windowed attention, region tokens for region-wise global information, and a global context model that links bottleneck representations for long-range dependencies; training supports both supervised MSE and diffusion flow matching, with memory-efficient finetuning. The approach is validated across 14 PDEs, scales from crops to $512^3$ isotropic turbulence, and demonstrates diffusion-based probabilistic sampling for turbulent channel flow, achieving accurate statistics and significant speedups over baselines. This work advances scalable scientific foundation models for truly high-resolution 3D physics with practical impact in engineering, climate science, and related domains, enabling both fast surrogates and uncertainty-aware simulations.

Abstract

We present a scalable framework for learning deterministic and probabilistic neural surrogates for high-resolution 3D physics simulations. We introduce a hybrid CNN-Transformer backbone architecture targeted for 3D physics simulations, which significantly outperforms existing architectures in terms of speed and accuracy. Our proposed network can be pretrained on small patches of the simulation domain, which can be fused to obtain a global solution, optionally guided via a fast and scalable sequence-to-sequence model to include long-range dependencies. This setup allows for training large-scale models with reduced memory and compute requirements for high-resolution datasets. We evaluate our backbone architecture against a large set of baseline methods with the objective to simultaneously learn the dynamics of 14 different types of PDEs in 3D. We demonstrate how to scale our model to high-resolution isotropic turbulence with spatial resolutions of up to $512^3$. Finally, we demonstrate the versatility of our network by training it as a diffusion model to produce probabilistic samples of highly turbulent 3D channel flows across varying Reynolds numbers, accurately capturing the underlying flow statistics.

P3D: Scalable Neural Surrogates for High-Resolution 3D Physics Simulations with Global Context

TL;DR

Presents P3D, a scalable surrogate framework for high-resolution 3D PDEs that blends CNNs and Transformers to enable fast, accurate autoregressive predictions and probabilistic sampling. The key ideas include a hybrid encoder–decoder with windowed attention, region tokens for region-wise global information, and a global context model that links bottleneck representations for long-range dependencies; training supports both supervised MSE and diffusion flow matching, with memory-efficient finetuning. The approach is validated across 14 PDEs, scales from crops to isotropic turbulence, and demonstrates diffusion-based probabilistic sampling for turbulent channel flow, achieving accurate statistics and significant speedups over baselines. This work advances scalable scientific foundation models for truly high-resolution 3D physics with practical impact in engineering, climate science, and related domains, enabling both fast surrogates and uncertainty-aware simulations.

Abstract

We present a scalable framework for learning deterministic and probabilistic neural surrogates for high-resolution 3D physics simulations. We introduce a hybrid CNN-Transformer backbone architecture targeted for 3D physics simulations, which significantly outperforms existing architectures in terms of speed and accuracy. Our proposed network can be pretrained on small patches of the simulation domain, which can be fused to obtain a global solution, optionally guided via a fast and scalable sequence-to-sequence model to include long-range dependencies. This setup allows for training large-scale models with reduced memory and compute requirements for high-resolution datasets. We evaluate our backbone architecture against a large set of baseline methods with the objective to simultaneously learn the dynamics of 14 different types of PDEs in 3D. We demonstrate how to scale our model to high-resolution isotropic turbulence with spatial resolutions of up to . Finally, we demonstrate the versatility of our network by training it as a diffusion model to produce probabilistic samples of highly turbulent 3D channel flows across varying Reynolds numbers, accurately capturing the underlying flow statistics.

Paper Structure

This paper contains 57 sections, 5 equations, 41 figures, 10 tables.

Figures (41)

  • Figure 1: Experiments: we train P3D on 14 different PDE dynamics simultaneously and verify its high efficiency and performance in a large benchmark comparison (left). We scale P3D to a simulation of forced isotorpic turbulence at resolution $512^3$ training only on crops of the simulation domain at size $128^3$ (middle). We train P3D as a diffusion model for a turbulent channel flow, assembling the simulation domain from smaller crops that are linked via a global context model (right).
  • Figure 2: Overview of P3D. Convolutional blocks for local features processing are combined with transformers for deep representation learning, yielding a U-shaped multi-scale architecture. The transformer backbone combines windowed attention and conditioning via adaptive instance normalization, which are modified and optimized for 3D.
  • Figure 3: Global context via a sequence model. The bottleneck layers are connected to the sequence model, which embeds the bottleneck representation as latent tokens. Region tokens are used to inject global information directly into the decoder.
  • Figure 5: Different training and inference setups. (a) shows training on the full domain and (b) on domain crops. (c) includes the context network for global information processing, which can also be trained by randomly disabling gradient backpropagation for a percentage of the encoders and decoders, see (d). In (e) the latent codes from a pretrained encoder can be precomputed and only the context network and decoder are trained.
  • Figure 6: Comparison of model accuracy vs. (left) memory usage during backpropagation and (right) computational costs for inference for jointly learning different types of PDEs with crops of size $64^3$ for P3D and baselines.
  • ...and 36 more figures