Table of Contents
Fetching ...

Transolver: A Fast Transformer Solver for PDEs on General Geometries

Haixu Wu, Huakun Luo, Haowen Wang, Jianmin Wang, Mingsheng Long

TL;DR

Transolver introduces Physics-Attention, which learns intrinsic physical states as slices of discretized domains and applies attention over physics-aware tokens to solve PDEs on general geometries with linear-time complexity. By focusing on physical states rather than individual mesh points, it effectively captures complex multiphysics correlations and scales to large unstructured meshes and industrial designs. Empirically, it achieves state-of-the-art performance across eight benchmarks, demonstrates strong efficiency, and exhibits robust out-of-distribution generalization, indicating potential as a foundation model for PDE solving. The work highlights the value of physics-guided tokenization for geometry-general neural operators and suggests paths toward large-scale pretraining and broader industrial adoption.

Abstract

Transformers have empowered many milestones across various fields and have recently been applied to solve partial differential equations (PDEs). However, since PDEs are typically discretized into large-scale meshes with complex geometries, it is challenging for Transformers to capture intricate physical correlations directly from massive individual points. Going beyond superficial and unwieldy meshes, we present Transolver based on a more foundational idea, which is learning intrinsic physical states hidden behind discretized geometries. Specifically, we propose a new Physics-Attention to adaptively split the discretized domain into a series of learnable slices of flexible shapes, where mesh points under similar physical states will be ascribed to the same slice. By calculating attention to physics-aware tokens encoded from slices, Transovler can effectively capture intricate physical correlations under complex geometrics, which also empowers the solver with endogenetic geometry-general modeling capacity and can be efficiently computed in linear complexity. Transolver achieves consistent state-of-the-art with 22% relative gain across six standard benchmarks and also excels in large-scale industrial simulations, including car and airfoil designs. Code is available at https://github.com/thuml/Transolver.

Transolver: A Fast Transformer Solver for PDEs on General Geometries

TL;DR

Transolver introduces Physics-Attention, which learns intrinsic physical states as slices of discretized domains and applies attention over physics-aware tokens to solve PDEs on general geometries with linear-time complexity. By focusing on physical states rather than individual mesh points, it effectively captures complex multiphysics correlations and scales to large unstructured meshes and industrial designs. Empirically, it achieves state-of-the-art performance across eight benchmarks, demonstrates strong efficiency, and exhibits robust out-of-distribution generalization, indicating potential as a foundation model for PDE solving. The work highlights the value of physics-guided tokenization for geometry-general neural operators and suggests paths toward large-scale pretraining and broader industrial adoption.

Abstract

Transformers have empowered many milestones across various fields and have recently been applied to solve partial differential equations (PDEs). However, since PDEs are typically discretized into large-scale meshes with complex geometries, it is challenging for Transformers to capture intricate physical correlations directly from massive individual points. Going beyond superficial and unwieldy meshes, we present Transolver based on a more foundational idea, which is learning intrinsic physical states hidden behind discretized geometries. Specifically, we propose a new Physics-Attention to adaptively split the discretized domain into a series of learnable slices of flexible shapes, where mesh points under similar physical states will be ascribed to the same slice. By calculating attention to physics-aware tokens encoded from slices, Transovler can effectively capture intricate physical correlations under complex geometrics, which also empowers the solver with endogenetic geometry-general modeling capacity and can be efficiently computed in linear complexity. Transolver achieves consistent state-of-the-art with 22% relative gain across six standard benchmarks and also excels in large-scale industrial simulations, including car and airfoil designs. Code is available at https://github.com/thuml/Transolver.
Paper Structure (63 sections, 3 theorems, 16 equations, 21 figures, 16 tables)

This paper contains 63 sections, 3 theorems, 16 equations, 21 figures, 16 tables.

Key Result

Theorem 3.4

Given input function $\boldsymbol{u}:\Omega\to\mathbb{R}^{C}$ and a mesh point $\mathbf{g}^\ast\in\Omega$, Physics-Attention is to approximate the integral operator $\mathcal{G}$, which is defined as: where $\kappa(\cdot,\cdot)$ denotes the kernel function defined on $\Omega\times \Omega$.

Figures (21)

  • Figure 1: Visualization of learned slices in Transolver. For each case, the leftmost subfigure is model input and the right shows learned slices. A brighter color indicates the mesh point is more ascribed to the corresponding slice. See Appendix \ref{['appdix:full_vis']} for more visualizations.
  • Figure 2: Learning physics-aware tokens from Transolver slices.
  • Figure 3: Overall design of Transolver layer, which replaces the standard attention with Physics-Attention. Each head encodes the input domain into a series of physics-aware tokens and then captures physical correlations under intricate geometrics by attention among tokens.
  • Figure 4: Car and airfoil design tasks. The key problem is to estimate the drag and lift force of a driving car or a flying airplane.
  • Figure 5: Physics-Attention visualization on Elasticity: (a) slice weights in the last layer of Transolver for both original and resampled meshes, (b) attention maps of the last layer in Transolver and Galerkin Transformer Cao2021ChooseAT. See Appendix \ref{['appdix:full_vis_slice']} for more visualizations.
  • ...and 16 more figures

Theorems & Definitions (11)

  • Remark 3.1: Why slices can learn physically internal-consistent information
  • Remark 3.2: Learning slice is different from splitting computation area
  • Remark 3.3: Attention as learnable integral operator
  • Theorem 3.4: Physics-Attention is equivalent to learnable integral on $\Omega$
  • proof
  • Lemma 1.1
  • proof
  • Remark 1.2: Solving PDEs by learning integral neural operators
  • Lemma 1.3
  • proof
  • ...and 1 more