A GPU-Accelerated Matrix-Free FAS Multigrid Solver for Navier-Stokes Equations with Memory-Efficient Implementations

Jiale Meng; Shuqi Tang; Steven M. Wise; Zhenlin Guo

A GPU-Accelerated Matrix-Free FAS Multigrid Solver for Navier-Stokes Equations with Memory-Efficient Implementations

Jiale Meng, Shuqi Tang, Steven M. Wise, Zhenlin Guo

TL;DR

The paper introduces a memory-efficient, GPU-accelerated matrix-free Full Approximation Storage (FAS) multigrid solver for Navier–Stokes and multiphase problems on staggered grids, implemented in MATLAB. A novel X-shape Multi-Color Gauss–Seidel (X-MCGS) smoother eliminates branching and enhances GPU parallelism, while inter-grid transfer and smoothing components are fully GPU-accelerated. Memory reuse strategies reduce GPU resident variables by up to ~60% (e.g., from 12–15 to 8), enabling large 2D and 3D simulations (up to $512^3$) on a single RTX GPU, with substantial speedups over CPU baselines (e.g., $61\times$ in 2D at $8192^2$, $46\times$ in 3D at $512^3$). The framework is validated on grain growth, vesicle phase separation, and air–water two-bubble coalescence, demonstrating both accuracy and scalability, and lays groundwork for future multi-GPU extensions.

Abstract

We develop a matrix-free Full Approximation Storage (FAS) multigrid solver based on staggered finite differences and implemented on GPU in MATLAB. To enhance performance, intermediate variables are reused, and an X-shape Multi-Color Gauss-Seidel (X-MCGS) smoother is introduced, which eliminates conditional branching by partitioning the grid into four submatrices. Restriction and prolongation operators are also GPU-accelerated. Convergence tests verify robustness and accuracy, while benchmarks show substantial speedups: for the 2D heat equation on an $8192^2$ grid, the RTX~4090 achieves $61\times$ over a single-core CPU, and in 3D at $512^3$, $46\times$. A memory-efficient implementation of first- and second-order projection schemes reduces GPU-resident variables from 12/15 to 8, lowering memory footprint and improving performance by 20--30%, enabling $512^3$ Navier-Stokes simulations on a single GPU. Grain growth on a $512^2$ grid accommodates up to $q=1189$ (2D) and $q=123$ (3D) orientations, reproducing expected scaling laws. Coupled with Cahn-Hilliard equations, air-water two-bubble coalescence is simulated on a $256\times 256\times 1024$ grid, agreeing with experimental observations.

A GPU-Accelerated Matrix-Free FAS Multigrid Solver for Navier-Stokes Equations with Memory-Efficient Implementations

TL;DR

) on a single RTX GPU, with substantial speedups over CPU baselines (e.g.,

in 2D at

in 3D at

). The framework is validated on grain growth, vesicle phase separation, and air–water two-bubble coalescence, demonstrating both accuracy and scalability, and lays groundwork for future multi-GPU extensions.

Abstract

grid, the RTX~4090 achieves

over a single-core CPU, and in 3D at

. A memory-efficient implementation of first- and second-order projection schemes reduces GPU-resident variables from 12/15 to 8, lowering memory footprint and improving performance by 20--30%, enabling

Navier-Stokes simulations on a single GPU. Grain growth on a

grid accommodates up to

(2D) and

(3D) orientations, reproducing expected scaling laws. Coupled with Cahn-Hilliard equations, air-water two-bubble coalescence is simulated on a

grid, agreeing with experimental observations.

A GPU-Accelerated Matrix-Free FAS Multigrid Solver for Navier-Stokes Equations with Memory-Efficient Implementations

TL;DR

Abstract

A GPU-Accelerated Matrix-Free FAS Multigrid Solver for Navier-Stokes Equations with Memory-Efficient Implementations

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (20)

Theorems & Definitions (3)