Table of Contents
Fetching ...

Investigate the efficiency of incompressible flow simulations on CPUs and GPUs with BSAMR

Dewen Liu, Shuai He, Haoran Cheng, Yadong Zeng

TL;DR

This paper tackles the problem of understanding how software-based BSAMR parameters affect the computational efficiency of incompressible flow simulations. It adopts a parametric study using the IAMR code (built on AMReX) to run extensive CPU/GPU tests across multiple 2D/3D cases, comparing options such as Max_level, Max_grid_size, Regrid_interval, Cycling, and Skip_level_projection. Key contributions include empirical guidelines on how refinement depth, patch sizing, regrid cadence, and time-stepping strategies interact with hardware to influence performance, along with nuanced recommendations for when to use subcycling versus non-subcycling. The findings have practical impact by guiding practitioners to tune BSAMR settings for speed and reproducibility, and the authors provide open-source code and profiling data to enable reproducibility and further research.

Abstract

Adaptive mesh refinement (AMR) is a classical technique about local refinement in space where needed, thus effectively reducing computational costs for HPC-based physics simulations. Although AMR has been used for many years, little reproducible research discusses the impact of software-based parameters on block-structured AMR (BSAMR) efficiency and how to choose them. This article primarily does parametric studies to investigate the computational efficiency of incompressible flows on a block-structured adaptive mesh. The parameters include refining block size, refining frequency, maximum level, and cycling method. A new projection skipping (PS) method is proposed, which brings insights about when and where the projections on coarser levels are safe to be omitted. We conduct extensive tests on different CPUs/GPUs for various 2D/3D incompressible flow cases, including bubble, RT instability, Taylor Green vortex, etc. Several valuable empirical conclusions are obtained to help guide simulations with BSAMR. Codes and all profiling data are available on GitHub.

Investigate the efficiency of incompressible flow simulations on CPUs and GPUs with BSAMR

TL;DR

This paper tackles the problem of understanding how software-based BSAMR parameters affect the computational efficiency of incompressible flow simulations. It adopts a parametric study using the IAMR code (built on AMReX) to run extensive CPU/GPU tests across multiple 2D/3D cases, comparing options such as Max_level, Max_grid_size, Regrid_interval, Cycling, and Skip_level_projection. Key contributions include empirical guidelines on how refinement depth, patch sizing, regrid cadence, and time-stepping strategies interact with hardware to influence performance, along with nuanced recommendations for when to use subcycling versus non-subcycling. The findings have practical impact by guiding practitioners to tune BSAMR settings for speed and reproducibility, and the authors provide open-source code and profiling data to enable reproducibility and further research.

Abstract

Adaptive mesh refinement (AMR) is a classical technique about local refinement in space where needed, thus effectively reducing computational costs for HPC-based physics simulations. Although AMR has been used for many years, little reproducible research discusses the impact of software-based parameters on block-structured AMR (BSAMR) efficiency and how to choose them. This article primarily does parametric studies to investigate the computational efficiency of incompressible flows on a block-structured adaptive mesh. The parameters include refining block size, refining frequency, maximum level, and cycling method. A new projection skipping (PS) method is proposed, which brings insights about when and where the projections on coarser levels are safe to be omitted. We conduct extensive tests on different CPUs/GPUs for various 2D/3D incompressible flow cases, including bubble, RT instability, Taylor Green vortex, etc. Several valuable empirical conclusions are obtained to help guide simulations with BSAMR. Codes and all profiling data are available on GitHub.
Paper Structure (15 sections, 7 equations, 9 figures, 3 tables)

This paper contains 15 sections, 7 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Impact of multithread/multicore CPUs on the runtime of lid-driven cavity case
  • Figure 2: Impact of different GPUs on the runtime of lid-driven cavity case
  • Figure 3: Comparison of runtime on the CPU and GPU for the 3D lid-driven cavity case
  • Figure 4: Percentage of function call time on GPUs for the 3D lid-driven cavity case
  • Figure 5: Running time for various cases with different ${Max\_level}$
  • ...and 4 more figures