Benchmarking neural surrogates on realistic spatiotemporal multiphysics flows
Runze Mao, Rui Zhang, Xuan Bai, Tianhao Wu, Teng Zhang, Zhenyi Chen, Minqi Lin, Bocheng Zeng, Yangchen Xu, Yingxuan Xiang, Haoze Zhang, Shubham Goswami, Pierre A. Dawe, Yifan Xu, Zhenhua An, Mengtao Yan, Xiaoyi Lu, Yi Wang, Rongbo Bai, Haobu Gao, Xiaohang Fang, Han Li, Hao Sun, Zhi X. Chen
TL;DR
REALM introduces a rigorous benchmark to evaluate neural surrogates on realistic multiphysics flows governed by PDE-ODE couplings, using 11 high-fidelity datasets across canonical and industrial scenarios. The authors provide an end-to-end framework with standardized preprocessing, rollout training, and capacity-aligned model presets, enabling fair cross-architecture comparisons among spectral, transformer, CNN, and graph-based surrogates. Across 2D/3D regular and irregular meshes and stiff chemistry, they observe (i) a scaling barrier tied to dimensionality, stiffness, and mesh regularity, (ii) inductive biases dominate performance more than parameter count, and (iii) a persistent gap between nominal metrics and physically faithful long-horizon behavior. The study highlights the need for physics-aware architectures and evaluation criteria focused on conservation and long-horizon fidelity, and offers REALM as a benchmark to drive development of robust surrogates.
Abstract
Predicting multiphysics dynamics is computationally expensive and challenging due to the severe coupling of multi-scale, heterogeneous physical processes. While neural surrogates promise a paradigm shift, the field currently suffers from an "illusion of mastery", as repeatedly emphasized in top-tier commentaries: existing evaluations overly rely on simplified, low-dimensional proxies, which fail to expose the models' inherent fragility in realistic regimes. To bridge this critical gap, we present REALM (REalistic AI Learning for Multiphysics), a rigorous benchmarking framework designed to test neural surrogates on challenging, application-driven reactive flows. REALM features 11 high-fidelity datasets spanning from canonical multiphysics problems to complex propulsion and fire safety scenarios, alongside a standardized end-to-end training and evaluation protocol that incorporates multiphysics-aware preprocessing and a robust rollout strategy. Using this framework, we systematically benchmark over a dozen representative surrogate model families, including spectral operators, convolutional models, Transformers, pointwise operators, and graph/mesh networks, and identify three robust trends: (i) a scaling barrier governed jointly by dimensionality, stiffness, and mesh irregularity, leading to rapidly growing rollout errors; (ii) performance primarily controlled by architectural inductive biases rather than parameter count; and (iii) a persistent gap between nominal accuracy metrics and physically trustworthy behavior, where models with high correlations still miss key transient structures and integral quantities. Taken together, REALM exposes the limits of current neural surrogates on realistic multiphysics flows and offers a rigorous testbed to drive the development of next-generation physics-aware architectures.
