Table of Contents
Fetching ...

CFDBench: A Large-Scale Benchmark for Machine Learning Methods in Fluid Dynamics

Yining Luo, Yingfa Chen, Zhen Zhang

TL;DR

CFDBench provides a large-scale, parameter-conditioned CFD benchmark to rigorously assess neural operators’ generalization to unseen boundary conditions, fluid properties, and domain geometries. It defines four classic flow problems, generates 302K frames via industry-grade solvers, and interpolates data to 64×64 grids, enabling fair comparisons of non-autoregressive and autoregressive models (FFN, DeepONet, FNO, U-Net, etc.). The study reveals significant generalization gaps, with many baselines exhibiting large errors and notable autoregressive error accumulation, underscoring the need for improved parameter-aware architectures and training strategies. By offering a standardized evaluation framework, CFDBench advances rigorous benchmarking and reproducibility for data-driven CFD solvers with practical implications for fast, generalizable surrogate modeling.

Abstract

In recent years, applying deep learning to solve physics problems has attracted much attention. Data-driven deep learning methods produce fast numerical operators that can learn approximate solutions to the whole system of partial differential equations (i.e., surrogate modeling). Although these neural networks may have lower accuracy than traditional numerical methods, they, once trained, are orders of magnitude faster at inference. Hence, one crucial feature is that these operators can generalize to unseen PDE parameters without expensive re-training.In this paper, we construct CFDBench, a benchmark tailored for evaluating the generalization ability of neural operators after training in computational fluid dynamics (CFD) problems. It features four classic CFD problems: lid-driven cavity flow, laminar boundary layer flow in circular tubes, dam flows through the steps, and periodic Karman vortex street. The data contains a total of 302K frames of velocity and pressure fields, involving 739 cases with different operating condition parameters, generated with numerical methods. We evaluate the effectiveness of popular neural operators including feed-forward networks, DeepONet, FNO, U-Net, etc. on CFDBnech by predicting flows with non-periodic boundary conditions, fluid properties, and flow domain shapes that are not seen during training. Appropriate modifications were made to apply popular deep neural networks to CFDBench and enable the accommodation of more changing inputs. Empirical results on CFDBench show many baseline models have errors as high as 300% in some problems, and severe error accumulation when performing autoregressive inference. CFDBench facilitates a more comprehensive comparison between different neural operators for CFD compared to existing benchmarks.

CFDBench: A Large-Scale Benchmark for Machine Learning Methods in Fluid Dynamics

TL;DR

CFDBench provides a large-scale, parameter-conditioned CFD benchmark to rigorously assess neural operators’ generalization to unseen boundary conditions, fluid properties, and domain geometries. It defines four classic flow problems, generates 302K frames via industry-grade solvers, and interpolates data to 64×64 grids, enabling fair comparisons of non-autoregressive and autoregressive models (FFN, DeepONet, FNO, U-Net, etc.). The study reveals significant generalization gaps, with many baselines exhibiting large errors and notable autoregressive error accumulation, underscoring the need for improved parameter-aware architectures and training strategies. By offering a standardized evaluation framework, CFDBench advances rigorous benchmarking and reproducibility for data-driven CFD solvers with practical implications for fast, generalizable surrogate modeling.

Abstract

In recent years, applying deep learning to solve physics problems has attracted much attention. Data-driven deep learning methods produce fast numerical operators that can learn approximate solutions to the whole system of partial differential equations (i.e., surrogate modeling). Although these neural networks may have lower accuracy than traditional numerical methods, they, once trained, are orders of magnitude faster at inference. Hence, one crucial feature is that these operators can generalize to unseen PDE parameters without expensive re-training.In this paper, we construct CFDBench, a benchmark tailored for evaluating the generalization ability of neural operators after training in computational fluid dynamics (CFD) problems. It features four classic CFD problems: lid-driven cavity flow, laminar boundary layer flow in circular tubes, dam flows through the steps, and periodic Karman vortex street. The data contains a total of 302K frames of velocity and pressure fields, involving 739 cases with different operating condition parameters, generated with numerical methods. We evaluate the effectiveness of popular neural operators including feed-forward networks, DeepONet, FNO, U-Net, etc. on CFDBnech by predicting flows with non-periodic boundary conditions, fluid properties, and flow domain shapes that are not seen during training. Appropriate modifications were made to apply popular deep neural networks to CFDBench and enable the accommodation of more changing inputs. Empirical results on CFDBench show many baseline models have errors as high as 300% in some problems, and severe error accumulation when performing autoregressive inference. CFDBench facilitates a more comprehensive comparison between different neural operators for CFD compared to existing benchmarks.
Paper Structure (59 sections, 21 equations, 13 figures, 15 tables)

This paper contains 59 sections, 21 equations, 13 figures, 15 tables.

Figures (13)

  • Figure 1: Some examples of the velocity field in the four problems in CFDBench. From left to right: cavity flow, tube flow, dam flow, and cylinder flow.
  • Figure 2: The residuals of each flow problems in this paper. (a) cavity flow, (b) tube flow, (c) dam flow, (d) cylinder flow.
  • Figure 3: Overview of the input and output types and shapes of each baseline model.
  • Figure 4: The structure of each baseline model in this paper.
  • Figure 5: The prediction results of 2 non-autoregressive baseline models on 7 subsets of the data set, with the vertical axis representing the average NMSE of all frames in the test set and the horizontal axis representing the data type.
  • ...and 8 more figures