Table of Contents
Fetching ...

Preconditioning of the generalized Stokes problem arising from the approximation of the time-dependent Navier-Stokes equations

Melvin Creff, Jean-Luc Guermond

TL;DR

The study evaluates preconditioning strategies for the generalized Stokes problem that arises from time discretization of the time-dependent Navier–Stokes equations, including pressure Schur-complement, fully coupled, and augmented Lagrangian variants. Using finite-element discretizations and large-scale 2D tests, it demonstrates that throughput differences between Schur- and full-system approaches are small, and that augmented Lagrangian preconditioners offer limited throughput benefits due to costly velocity solves. All methods tested are, on average, about 25 times slower than traditional pressure-correction methods, indicating that these approaches, while efficient for steady problems, are not competitive for time-dependent simulations in their current form. The authors suggest that new algorithms that fuse time and space considerations are necessary to match the efficiency of projection-based time stepping for Navier–Stokes problems. This work provides a comprehensive, numerically rigorous benchmark of preconditioners for generalized Stokes problems and clarifies the practical trade-offs in parallel throughput and iteration count under realistic HPC settings.

Abstract

The paper compares standard iterative methods for solving the generalized Stokes problem arising from the time and space approximation of the time-dependent incompressible Navier-Stokes equations. Various preconditioning techniques are considered (Schur complement, fully coupled system, with and without augmented Lagrangian). One investigates whether these methods can compete with traditional pressure-correction and velocity-correction methods in terms of throughput (number of degrees of freedom per time step per core per second). Numerical tests on fine unstructured meshes (68 millions degrees of freedoms) demonstrate GMRES/CG convergence rates that are independent of the mesh size and improve with the Reynolds number for most methods. Three conclusions are drawn: (1) Whether solving the pressure Schur complement or the fully coupled system does not make any significant difference in terms of throughput. (2) Although very good parallel scalability is observed for the augmented Lagrangian method, the best throughput is achieved without using the augmented Lagrangian formulation. (3) The throughput of all the methods tested in the paper are on average 25 times slower than that of traditional pressure-correction and velocity-correction methods. Hence, although all these methods are very efficient for solving steady state problems, none of them is unfortunately competitive for solving time-dependent problems.

Preconditioning of the generalized Stokes problem arising from the approximation of the time-dependent Navier-Stokes equations

TL;DR

The study evaluates preconditioning strategies for the generalized Stokes problem that arises from time discretization of the time-dependent Navier–Stokes equations, including pressure Schur-complement, fully coupled, and augmented Lagrangian variants. Using finite-element discretizations and large-scale 2D tests, it demonstrates that throughput differences between Schur- and full-system approaches are small, and that augmented Lagrangian preconditioners offer limited throughput benefits due to costly velocity solves. All methods tested are, on average, about 25 times slower than traditional pressure-correction methods, indicating that these approaches, while efficient for steady problems, are not competitive for time-dependent simulations in their current form. The authors suggest that new algorithms that fuse time and space considerations are necessary to match the efficiency of projection-based time stepping for Navier–Stokes problems. This work provides a comprehensive, numerically rigorous benchmark of preconditioners for generalized Stokes problems and clarifies the practical trade-offs in parallel throughput and iteration count under realistic HPC settings.

Abstract

The paper compares standard iterative methods for solving the generalized Stokes problem arising from the time and space approximation of the time-dependent incompressible Navier-Stokes equations. Various preconditioning techniques are considered (Schur complement, fully coupled system, with and without augmented Lagrangian). One investigates whether these methods can compete with traditional pressure-correction and velocity-correction methods in terms of throughput (number of degrees of freedom per time step per core per second). Numerical tests on fine unstructured meshes (68 millions degrees of freedoms) demonstrate GMRES/CG convergence rates that are independent of the mesh size and improve with the Reynolds number for most methods. Three conclusions are drawn: (1) Whether solving the pressure Schur complement or the fully coupled system does not make any significant difference in terms of throughput. (2) Although very good parallel scalability is observed for the augmented Lagrangian method, the best throughput is achieved without using the augmented Lagrangian formulation. (3) The throughput of all the methods tested in the paper are on average 25 times slower than that of traditional pressure-correction and velocity-correction methods. Hence, although all these methods are very efficient for solving steady state problems, none of them is unfortunately competitive for solving time-dependent problems.
Paper Structure (35 sections, 2 theorems, 36 equations, 7 figures, 8 tables)

This paper contains 35 sections, 2 theorems, 36 equations, 7 figures, 8 tables.

Key Result

Proposition 2.1

Let $\sigma(S)$ be the spectrum of $S$. Let $\mu_{\min}$, $\mu_{\max}$ be the smallest and largest eigenvalues of the pressure mass matrix $M_Q$. Let $\|a\|$ and $\|b\|$ be the norms of the bilinear forms $a$ and $b$. Let $\alpha_h$ be the coercivity constant of $a$. Then $\sigma(S) \subset [\mu_{\m

Figures (7)

  • Figure 1: Comparison of the four preconditioners $(C_0)_{{\mathsf{th}}}$, $(C_0)_{{\mathsf{2Vc}}}$, $(C_0^\Lambda)_{{\mathsf{th}}}$, and $(C_0^\Lambda)_{{\mathsf{2Vc}}}$. Y-axis: throughput in kdofs per s for the top row and number of GMRES iterations for the bottom row. X-axis: total number of dofs. Left column: $\mu=1.$ Center column: $\mu=10^{-2}.$ Right column: $\mu=10^{-4}.$
  • Figure 2: GMRES residual vs. iteration count for the preconditioner $(C_0^\Delta)_{{\mathsf{th}}}$ with $\mu=1$ ( blue), $\mu=10^{-2}$ ( red), and $\mu=10^{-4}$ ( green). Number of velocity grid points: 118,785 ([0.5ex][c]6.4mm0.6pt); 473,857 ([0.5ex][c]6.8mm0.6pt1.9mm 0.6mm 0.3mm 0.6mm ), 1,892,865 ([0.5ex][x]6.4mm0.6pt0.9mm ) ; 7,566,337 ([0.5ex][x]6.4mm0.6pt0.3mm ).
  • Figure 3: Comparison of the three preconditioners $(C_{0}^\Lambda)_{{\mathsf{2Vc}}}$, $(C_{1}^\Lambda)_{{\mathsf{2Vc}}}$, $(C_{10}^\Lambda)_{{\mathsf{2Vc}}}$. Y-axis: throughput in kdofs per s for the top row and number of GMRES iterations for the bottom row. X-axis: total number of dofs. Left column : $\mu=1.$ Center column: $\mu=10^{-2}.$ Right column: $\mu=10^{-4}.$
  • Figure 4: Comparison of the following four preconditioner pairs for the full matrix ${\mathbb A}_0$: $((C_{0}^\Lambda)_{{\mathsf{th}}},({\widetilde{A}}_{0,2})_{{\mathsf{th}}}^{-1})$, $((C_{0}^\Lambda)_{{\mathsf{2Vc}}},({\widetilde{A}}_{0,2})_{{\mathsf{2Vc}}}^{-1})$, $((C_{0}^\Lambda)_{{\mathsf{th}}},({\widetilde{A}}_{0,3})_{{\mathsf{th}}}^{-1})$, and $((C_{0}^\Lambda)_{{\mathsf{2Vc}}},({\widetilde{A}}_{0,3})_{{\mathsf{2Vc}}}^{-1})$. Y-axis: throughput in kdofs per s for the top row, and number of GMRES iterations for the bottom row. X-axis: total number of dofs. Left column: $\mu=1.$ Center column: $\mu=10^{-2}.$ Right column: $\mu=10^{-4}.$
  • Figure 5: Left: "incomplete Full" $(C^\Lambda_0)_{{\mathsf{2Vc}}}, ({\widetilde{A}}_{0,2})_{{\mathsf{2Vc}}}$ and "Full" $(C^\Lambda_0)_{{\mathsf{2Vc}}}, ({\widetilde{A}}_{0,2})_{{\mathsf{2Vc}}}$. Right: "incomplete Full" $(C^\Lambda_0)_{{\mathsf{2Vc}}}, ({\widetilde{A}}_{0,3})_{{\mathsf{2Vc}}}$ and "Full" $(C^\Lambda_0)_{{\mathsf{2Vc}}}, ({\widetilde{A}}_{0,3})_{{\mathsf{2Vc}}}$. Y-axis: throughput in kdofs per s. X-axis: total number of dofs.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Proposition 2.1: Spectrum of $S$
  • Proposition 2.2: Augmented Lagrangian
  • Remark 5.1: Replacing $BM_q^{-1}B^{\mathsf T}$ by $L_Q^{-1}$
  • Remark 6.1: Alternative preconditioner