Investigate the efficiency of incompressible flow simulations on CPUs and GPUs with BSAMR

Dewen Liu; Shuai He; Haoran Cheng; Yadong Zeng

Investigate the efficiency of incompressible flow simulations on CPUs and GPUs with BSAMR

Dewen Liu, Shuai He, Haoran Cheng, Yadong Zeng

TL;DR

This paper tackles the problem of understanding how software-based BSAMR parameters affect the computational efficiency of incompressible flow simulations. It adopts a parametric study using the IAMR code (built on AMReX) to run extensive CPU/GPU tests across multiple 2D/3D cases, comparing options such as Max_level, Max_grid_size, Regrid_interval, Cycling, and Skip_level_projection. Key contributions include empirical guidelines on how refinement depth, patch sizing, regrid cadence, and time-stepping strategies interact with hardware to influence performance, along with nuanced recommendations for when to use subcycling versus non-subcycling. The findings have practical impact by guiding practitioners to tune BSAMR settings for speed and reproducibility, and the authors provide open-source code and profiling data to enable reproducibility and further research.

Abstract

Adaptive mesh refinement (AMR) is a classical technique about local refinement in space where needed, thus effectively reducing computational costs for HPC-based physics simulations. Although AMR has been used for many years, little reproducible research discusses the impact of software-based parameters on block-structured AMR (BSAMR) efficiency and how to choose them. This article primarily does parametric studies to investigate the computational efficiency of incompressible flows on a block-structured adaptive mesh. The parameters include refining block size, refining frequency, maximum level, and cycling method. A new projection skipping (PS) method is proposed, which brings insights about when and where the projections on coarser levels are safe to be omitted. We conduct extensive tests on different CPUs/GPUs for various 2D/3D incompressible flow cases, including bubble, RT instability, Taylor Green vortex, etc. Several valuable empirical conclusions are obtained to help guide simulations with BSAMR. Codes and all profiling data are available on GitHub.

Investigate the efficiency of incompressible flow simulations on CPUs and GPUs with BSAMR

TL;DR

Abstract

Paper Structure (15 sections, 7 equations, 9 figures, 3 tables)

This paper contains 15 sections, 7 equations, 9 figures, 3 tables.

Introduction
Mathematical formulation
Projection-based Fluid Solver
Cycling method on the multiple levels
Open-source incompressible flow code and profiling data
Parameters related to BSAMR
Testing Cases
Results
CPU and GPU performance
Max_level
Max_grid_size
Cycling
Regrid_interval
Skip_level_projection
Conclusions and Future Directions

Figures (9)

Figure 1: Impact of multithread/multicore CPUs on the runtime of lid-driven cavity case
Figure 2: Impact of different GPUs on the runtime of lid-driven cavity case
Figure 3: Comparison of runtime on the CPU and GPU for the 3D lid-driven cavity case
Figure 4: Percentage of function call time on GPUs for the 3D lid-driven cavity case
Figure 5: Running time for various cases with different ${Max\_level}$
...and 4 more figures

Investigate the efficiency of incompressible flow simulations on CPUs and GPUs with BSAMR

TL;DR

Abstract

Investigate the efficiency of incompressible flow simulations on CPUs and GPUs with BSAMR

Authors

TL;DR

Abstract

Table of Contents

Figures (9)