Table of Contents
Fetching ...

Fluid Intelligence: A Forward Look on AI Foundation Models in Computational Fluid Dynamics

Neil Ashton, Johannes Brandstetter, Siddhartha Mishra

TL;DR

This work addresses the problem of building foundation models for Computational Fluid Dynamics (CFD) by introducing a CFD-specific scaling law that couples data-generation costs and model-training costs across fidelity regimes (low-fidelity RANS, high-fidelity LES, and high-fidelity transient LES). The authors argue that incorporating high-fidelity transient data provides the optimum route to a robust CFD foundation model, and they provide concrete numerical estimates for representative configurations to illustrate the data-to-model tradeoffs. The paper also discusses how CFD foundation models differ from Large Language Models, emphasizing the continuous, multi-fidelity, and highly conditional input space (geometry, preprocessing, meshing, physics, discretization) and the need for co-designed GPU-native solvers and online training to manage storage and I/O bottlenecks. Overall, the work offers a principled framework for scaling CFD foundation models, delineates regimes where data generation or training dominates, and outlines practical avenues and open questions for advancing data-driven CFD at industrial scales.

Abstract

Driven by the advancement of GPUs and AI, the field of Computational Fluid Dynamics (CFD) is undergoing significant transformations. This paper bridges the gap between the machine learning and CFD communities by deconstructing industrial-scale CFD simulations into their core components. Our main contribution is to propose the first scaling law that incorporates CFD inputs for both data generation and model training to outline the unique challenges of developing and deploying these next-generation AI models for complex fluid dynamics problems. Using our new scaling law, we establish quantitative estimates for the large-scale limit, distinguishing between regimes where the cost of data generation is the dominant factor in total compute versus where the cost of model training prevails. We conclude that the incorporation of high-fidelity transient data provides the optimum route to a foundation model. We constrain our theory with concrete numbers, providing the first public estimates on the computational cost and time to build a foundation model for CFD.

Fluid Intelligence: A Forward Look on AI Foundation Models in Computational Fluid Dynamics

TL;DR

This work addresses the problem of building foundation models for Computational Fluid Dynamics (CFD) by introducing a CFD-specific scaling law that couples data-generation costs and model-training costs across fidelity regimes (low-fidelity RANS, high-fidelity LES, and high-fidelity transient LES). The authors argue that incorporating high-fidelity transient data provides the optimum route to a robust CFD foundation model, and they provide concrete numerical estimates for representative configurations to illustrate the data-to-model tradeoffs. The paper also discusses how CFD foundation models differ from Large Language Models, emphasizing the continuous, multi-fidelity, and highly conditional input space (geometry, preprocessing, meshing, physics, discretization) and the need for co-designed GPU-native solvers and online training to manage storage and I/O bottlenecks. Overall, the work offers a principled framework for scaling CFD foundation models, delineates regimes where data generation or training dominates, and outlines practical avenues and open questions for advancing data-driven CFD at industrial scales.

Abstract

Driven by the advancement of GPUs and AI, the field of Computational Fluid Dynamics (CFD) is undergoing significant transformations. This paper bridges the gap between the machine learning and CFD communities by deconstructing industrial-scale CFD simulations into their core components. Our main contribution is to propose the first scaling law that incorporates CFD inputs for both data generation and model training to outline the unique challenges of developing and deploying these next-generation AI models for complex fluid dynamics problems. Using our new scaling law, we establish quantitative estimates for the large-scale limit, distinguishing between regimes where the cost of data generation is the dominant factor in total compute versus where the cost of model training prevails. We conclude that the incorporation of high-fidelity transient data provides the optimum route to a foundation model. We constrain our theory with concrete numbers, providing the first public estimates on the computational cost and time to build a foundation model for CFD.

Paper Structure

This paper contains 86 sections, 42 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: A Roofline model illustrating the performance regimes for compute-bound and memory-bound algorithms on different architectures.
  • Figure 2: Drag coefficient distribution for the DrivAerML and DrivAerNET++ datasets. Dark yellow areas represent points of overlapping drag coefficient.
  • Figure 3: Comparative analysis of data generation and model training costs ($y$-axis in $ millions) versus sample size ($x$-axis in millions) for the transient high-fidelity case. Source code provided in Appendix \ref{['sec:code']}