Table of Contents
Fetching ...

Flowers: A Warp Drive for Neural PDE Solvers

Till Muser, Alexandra Spitzer, Matti Lassas, Maarten V. de Hoop, Ivan Dokmanić

TL;DR

Flowers, a neural architecture for learning PDE solution operators built entirely from multihead warps, theoretically motivate this design through three complementary lenses: flow maps for conservation laws, waves in inhomogeneous media, and a kinetic-theoretic continuum limit.

Abstract

We introduce Flowers, a neural architecture for learning PDE solution operators built entirely from multihead warps. Aside from pointwise channel mixing and a multiscale scaffold, Flowers use no Fourier multipliers, no dot-product attention, and no convolutional mixing. Each head predicts a displacement field and warps the mixed input features. Motivated by physics and computational efficiency, displacements are predicted pointwise, without any spatial aggregation, and nonlocality enters \emph{only} through sparse sampling at source coordinates, \emph{one} per head. Stacking warps in multiscale residual blocks yields Flowers, which implement adaptive, global interactions at linear cost. We theoretically motivate this design through three complementary lenses: flow maps for conservation laws, waves in inhomogeneous media, and a kinetic-theoretic continuum limit. Flowers achieve excellent performance on a broad suite of 2D and 3D time-dependent PDE benchmarks, particularly flows and waves. A compact 17M-parameter model consistently outperforms Fourier, convolution, and attention-based baselines of similar size, while a 150M-parameter variant improves over recent transformer-based foundation models with much more parameters, data, and training compute.

Flowers: A Warp Drive for Neural PDE Solvers

TL;DR

Flowers, a neural architecture for learning PDE solution operators built entirely from multihead warps, theoretically motivate this design through three complementary lenses: flow maps for conservation laws, waves in inhomogeneous media, and a kinetic-theoretic continuum limit.

Abstract

We introduce Flowers, a neural architecture for learning PDE solution operators built entirely from multihead warps. Aside from pointwise channel mixing and a multiscale scaffold, Flowers use no Fourier multipliers, no dot-product attention, and no convolutional mixing. Each head predicts a displacement field and warps the mixed input features. Motivated by physics and computational efficiency, displacements are predicted pointwise, without any spatial aggregation, and nonlocality enters \emph{only} through sparse sampling at source coordinates, \emph{one} per head. Stacking warps in multiscale residual blocks yields Flowers, which implement adaptive, global interactions at linear cost. We theoretically motivate this design through three complementary lenses: flow maps for conservation laws, waves in inhomogeneous media, and a kinetic-theoretic continuum limit. Flowers achieve excellent performance on a broad suite of 2D and 3D time-dependent PDE benchmarks, particularly flows and waves. A compact 17M-parameter model consistently outperforms Fourier, convolution, and attention-based baselines of similar size, while a 150M-parameter variant improves over recent transformer-based foundation models with much more parameters, data, and training compute.
Paper Structure (45 sections, 94 equations, 12 figures, 8 tables)

This paper contains 45 sections, 94 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: The multi-head Selfwarp layer. (a) Given an input feature map $u$ with $C$ channels, (b) for each pixel $x$ in the image the layer computes (c) a local per-head interaction $v^{(h)}(x) \in \mathbb R^{C_h}$ via a linear (affine) projection of the input $u(x)$ and (d) a displacement field (one per head) $\varrho^{(h)}(x)$, also pointwise, using a small MLP. Each $C_h$-channel per-head interaction field $v^{(h)}$ (e) is warped along the same per-head displacement field $\varrho^{(h)}$ (f) to compute the warped head output (g). The outputs of all heads are concatenated at the output (h).
  • Figure 2: Wavefronts and rays picture: high-frequency waves propagate along rays. The solution at position $x$ and time $t$ is a sum of contributions along the wavefront, which can be indexed by head $\eta$.
  • Figure 3: Comparison of model predictions on the viscoelastic instability dataset (4→1 unconditioned setting), showing the conformation tensor entry $C_{zz}$. The top row shows the ground truth and the prediction of each model, bottom row shows the last of the four input frames and prediction errors.
  • Figure 4: Comparison of autoregressive rollout model predictions on the shear flow dataset (4→1 unconditioned setting), showing the tracer.
  • Figure 5: Comparison of model predictions on the Rayleigh-Taylor instability dataset (4→1 unconditioned setting), showing the density. Visualized using vape4d koehler2024apebench.
  • ...and 7 more figures