Table of Contents
Fetching ...

CarBench: A Comprehensive Benchmark for Neural Surrogates on High-Fidelity 3D Car Aerodynamics

Mohamed Elrefaie, Dule Shu, Matt Klenk, Faez Ahmed

TL;DR

CarBench delivers a unified, open benchmark for neural surrogates of high-fidelity 3D car aerodynamics using the DrivAerNet++ dataset, evaluating 11 architectures across interpolation and cross-category generalization under a standardized pipeline. It introduces bootstrap-based uncertainty quantification, full-mesh evaluation, and component-level tests (e.g., wheel aerodynamics) to ensure physically grounded assessment. The results show transformer-based solvers (notably AB-UPT and Transolver variants) achieving the best accuracy–efficiency trade-offs, with dataset scale and geometric diversity being critical for robust zero-shot generalization. By providing standardized data processing, evaluation protocols, and open-source tooling, CarBench aims to catalyze reproducible progress in data-driven CFD surrogates for automotive and related external-aero applications.

Abstract

Benchmarking has been the cornerstone of progress in computer vision, natural language processing, and the broader deep learning domain, driving algorithmic innovation through standardized datasets and reproducible evaluation protocols. The growing availability of large-scale Computational Fluid Dynamics (CFD) datasets has opened new opportunities for applying machine learning to aerodynamic and engineering design. Yet, despite this progress, there exists no standardized benchmark for large-scale numerical simulations in engineering design. In this work, we introduce CarBench, the first comprehensive benchmark dedicated to large-scale 3D car aerodynamics, performing a large-scale evaluation of state-of-the-art models on DrivAerNet++, the largest public dataset for automotive aerodynamics, containing over 8,000 high-fidelity car simulations. We assess eleven architectures spanning neural operator methods (e.g., Fourier Neural Operator), geometric deep learning (PointNet, RegDGCNN, PointMAE, PointTransformer), transformer-based neural solvers (Transolver, Transolver++, AB-UPT), and implicit field networks (TripNet). Beyond standard interpolation tasks, we perform cross-category experiments in which transformer-based solvers trained on a single car archetype are evaluated on unseen categories. Our analysis covers predictive accuracy, physical consistency, computational efficiency, and statistical uncertainty. To accelerate progress in data-driven engineering, we open-source the benchmark framework, including training pipelines, uncertainty estimation routines based on bootstrap resampling, and pretrained model weights, establishing the first reproducible foundation for large-scale learning from high-fidelity CFD simulations, available at https://github.com/Mohamedelrefaie/CarBench.

CarBench: A Comprehensive Benchmark for Neural Surrogates on High-Fidelity 3D Car Aerodynamics

TL;DR

CarBench delivers a unified, open benchmark for neural surrogates of high-fidelity 3D car aerodynamics using the DrivAerNet++ dataset, evaluating 11 architectures across interpolation and cross-category generalization under a standardized pipeline. It introduces bootstrap-based uncertainty quantification, full-mesh evaluation, and component-level tests (e.g., wheel aerodynamics) to ensure physically grounded assessment. The results show transformer-based solvers (notably AB-UPT and Transolver variants) achieving the best accuracy–efficiency trade-offs, with dataset scale and geometric diversity being critical for robust zero-shot generalization. By providing standardized data processing, evaluation protocols, and open-source tooling, CarBench aims to catalyze reproducible progress in data-driven CFD surrogates for automotive and related external-aero applications.

Abstract

Benchmarking has been the cornerstone of progress in computer vision, natural language processing, and the broader deep learning domain, driving algorithmic innovation through standardized datasets and reproducible evaluation protocols. The growing availability of large-scale Computational Fluid Dynamics (CFD) datasets has opened new opportunities for applying machine learning to aerodynamic and engineering design. Yet, despite this progress, there exists no standardized benchmark for large-scale numerical simulations in engineering design. In this work, we introduce CarBench, the first comprehensive benchmark dedicated to large-scale 3D car aerodynamics, performing a large-scale evaluation of state-of-the-art models on DrivAerNet++, the largest public dataset for automotive aerodynamics, containing over 8,000 high-fidelity car simulations. We assess eleven architectures spanning neural operator methods (e.g., Fourier Neural Operator), geometric deep learning (PointNet, RegDGCNN, PointMAE, PointTransformer), transformer-based neural solvers (Transolver, Transolver++, AB-UPT), and implicit field networks (TripNet). Beyond standard interpolation tasks, we perform cross-category experiments in which transformer-based solvers trained on a single car archetype are evaluated on unseen categories. Our analysis covers predictive accuracy, physical consistency, computational efficiency, and statistical uncertainty. To accelerate progress in data-driven engineering, we open-source the benchmark framework, including training pipelines, uncertainty estimation routines based on bootstrap resampling, and pretrained model weights, establishing the first reproducible foundation for large-scale learning from high-fidelity CFD simulations, available at https://github.com/Mohamedelrefaie/CarBench.

Paper Structure

This paper contains 13 sections, 10 equations, 9 figures, 7 tables, 1 algorithm.

Figures (9)

  • Figure 1: Overview of CarBench. The benchmark provides a unified framework for evaluating machine learning models on high-fidelity automotive aerodynamics. It integrates diverse car geometries, CFD-derived surface pressure fields, and automated training pipelines with physics-based and ML metrics. Architectures from multiple model families (GNNs, Geometric Deep Learning, Neural Operators, Transformers, and Triplane networks) are evaluated under a standardized protocol. The framework includes cross-category generalization, uncertainty quantification, computational efficiency analysis, statistical significance evaluation, subsampled vs. full-mesh performance comparison, and wheel aerodynamics prediction. Together, these components enable a comprehensive and physically grounded assessment of aerodynamic surrogate models.
  • Figure 2: Performance–efficiency trade-offs across model families on CarBench. Left: Relative L2 error versus model size (in millions of parameters, log scale). Right: Relative L2 error versus mean inference time (ms) on an NVIDIA A100 GPU. Transformer-based models and implicit-field architectures (AB-UPT, TransolverLarge, Transolver) achieve the best accuracy–efficiency balance, combining low error with compact parameter counts and fast inference. Point-based baselines suffer from higher errors despite lower complexity, while graph-based networks incur large memory and latency overheads. These plots highlight the Pareto frontier of aerodynamic surrogate modeling, revealing models that are both accurate and scalable.
  • Figure 3: Qualitative comparison of surface pressure predictions for design E_S_WW_WM_648 from the unseen test set of the DrivAerNet++ dataset. Both views illustrate model accuracy in predicting detailed aerodynamic pressure fields.
  • Figure 4: Pressure prediction on the front-left wheel compared to the CFD ground truth. Models are ordered from worst to best (left to right) based on their Relative L2 error. Wheels can contribute up to 25% of a car’s total aerodynamic drag, making accurate pressure modeling around rotating components critical. Compared to prior open-source datasets, DrivAerNet++ uniquely provides detailed and smooth wheel and tire geometries, enabling evaluation across both open and closed wheel configurations.
  • Figure 5: Training and validation loss curves for eight models evaluated on the DrivAerNet++ dataset. Each plot shows model convergence behavior over training epochs, with blue indicating training loss and red indicating validation loss. These trends highlight differences in optimization stability and representational efficiency across network architectures for aerodynamic learning.
  • ...and 4 more figures