Table of Contents
Fetching ...

Multigrid with Linear Storage Complexity

Daniel Bauer, Nils Kohl, Stephen F. McCormick, Rasmus Tamstorf

TL;DR

This work tackles the memory bottleneck in discretization-error accurate PDE solvers by developing a full multigrid method that stores the solution and intermediates in a compact, regressive-precision format, achieving $O(n)$ storage instead of the conventional $O(n\log n)$. The method uses a compact representation across multigrid levels, intertwined with a compact full approximation scheme and matrix-free operations to preserve linear arithmetic cost. Key contributions include a rigorous precision framework, memory-optimized algorithms for residuals and corrections, and a detailed cost analysis showing linear storage and near-linear computation. Numerical experiments on Poisson and biharmonic models show the solution can be stored with as few as a handful of bits per DoF while maintaining discretization-error accuracy, implying substantial memory savings on memory-constrained hardware. The approach promises scalable PDE solving on next-generation HPC systems, with potential extensions to parallel implementations and broader PDE classes.

Abstract

As the discretization error for the solution of a partial differential equation (PDE) decreases, the precision required to store the corresponding coefficients naturally increases. Storing the solution's finite element coefficients explicitly requires $\mathcal O(n \log n)$ bits of storage, where $n$ is the number of degrees of freedom (DoFs). This paper presents a full multigrid method to compute the solution in a compressed format that reduces the storage complexity of the solution and intermediate vectors to $\mathcal O(n)$ bits. This reduction allows a matrix-free implementation to solve elliptic PDEs with an overall linear space complexity. For problems limited by the memory capacity of current supercomputers, we expect a memory footprint reduction of about an order of magnitude compared to state-of-the-art mixed-precision methods. We demonstrate the applicability of our algorithm by solving two model problems. Depending on the PDE and polynomial degree, but irrespective of the problem size, the solution vector on the finest grid requires between 4 and 12 bits per DoF, and the residual and correction require 3 to 6 bits each. Additional data is stored on the coarse grids with modestly increasing bit widths toward coarser grids.

Multigrid with Linear Storage Complexity

TL;DR

This work tackles the memory bottleneck in discretization-error accurate PDE solvers by developing a full multigrid method that stores the solution and intermediates in a compact, regressive-precision format, achieving storage instead of the conventional . The method uses a compact representation across multigrid levels, intertwined with a compact full approximation scheme and matrix-free operations to preserve linear arithmetic cost. Key contributions include a rigorous precision framework, memory-optimized algorithms for residuals and corrections, and a detailed cost analysis showing linear storage and near-linear computation. Numerical experiments on Poisson and biharmonic models show the solution can be stored with as few as a handful of bits per DoF while maintaining discretization-error accuracy, implying substantial memory savings on memory-constrained hardware. The approach promises scalable PDE solving on next-generation HPC systems, with potential extensions to parallel implementations and broader PDE classes.

Abstract

As the discretization error for the solution of a partial differential equation (PDE) decreases, the precision required to store the corresponding coefficients naturally increases. Storing the solution's finite element coefficients explicitly requires bits of storage, where is the number of degrees of freedom (DoFs). This paper presents a full multigrid method to compute the solution in a compressed format that reduces the storage complexity of the solution and intermediate vectors to bits. This reduction allows a matrix-free implementation to solve elliptic PDEs with an overall linear space complexity. For problems limited by the memory capacity of current supercomputers, we expect a memory footprint reduction of about an order of magnitude compared to state-of-the-art mixed-precision methods. We demonstrate the applicability of our algorithm by solving two model problems. Depending on the PDE and polynomial degree, but irrespective of the problem size, the solution vector on the finest grid requires between 4 and 12 bits per DoF, and the residual and correction require 3 to 6 bits each. Additional data is stored on the coarse grids with modestly increasing bit widths toward coarser grids.

Paper Structure

This paper contains 35 sections, 1 theorem, 27 equations, 5 figures, 2 tables, 6 algorithms.

Key Result

Lemma 1

Consider a second-order pde, written in weak form as where $a(u,v)$ is a symmetric $H^1(\Omega)$ bounded and coercive bilinear form and $b(v)$ is a linear form, both derived from weakening the strong form $\mathcal{L} u = f$ of the pde. Consider the energy norm $a(\cdot,\cdot)^{1/2}$. As $L \to \infty$, the difference between the best energy approxima

Figures (5)

  • Figure 1: Solving the homogeneous Poisson equation on the unit interval for the manufactured solution $u(x) = x(1-x) \cos(\frac{\pi}{2} x)$. The problem is discretized with B-spline finite elements of degree $p$ and solved by a "textbook" *fmg solver executed in an a priori fixed floating point precision. The error ceases to decrease with the expected rate of $\mathcal{O}(h^p)$ as the limit of the finite precision is reached. A higher precision allows for finer grids, but any fixed bit-width constrains the attainable accuracy.
  • Figure 1: An example function discretized with piecewise-linear elements. The basis on level 2.0 is denoted by the row vector $\Phi_2(x)$. In standard representation, only the finest grid is used. In contrast, the compact vector consists of a coarse approximation and two successively finer corrections. Crucially, the magnitude of the corrections decreases toward increasingly finer grids, allowing them to be stored in lower precision without significant loss of accuracy.
  • Figure 1: Estimated memory savings of compact multigrid compared to discretization-error accurate fmg in progressive bfp precision kohl_multigrid_2024 (current state-of-the-art) for varying polynomial degrees. Here, we calculate with the experimentally determined constants from \ref{['tab:experiments-params']}. A marker is placed every second refinemet level.
  • Figure 1: Relative $H^m$-error $\|\tilde{u}_L - u\|_{H^m} / \|u\|_{H^m}$ of the computed compact solution $\tilde{u}_L$ to Poisson's equation \ref{['eq:pde-poisson']} ($m = 1$, $d \in \{1, 2\}$) and the biharmonic equation \ref{['eq:pde-biharmonic']} ($m = 2$, $d=1$) discretized with B-splines of degree $p$ (using parameters from \ref{['tab:experiments-params']}). The numerical results show the expected convergence rate $\mathcal{O}(h^{p - m + 1})$ consistent with theoretical results, where $h=2^{-L}$. It is worth comparing these results to \ref{['fig:hockey-sticks']} to get an idea of the limits of common IEEE floating-point formats.
  • Figure 2: Sketch of a vector in standard representation (progressive precision) and compact representation (regressive precision). The level index increases from bottom to top, and bits become less significant from left to right. In the compact case, all sections $\acute{u}_{0\rng L}$ are stored as a whole, while in standard representation the finest grid vector represents the entire solution. The bit width of the finest compact section is independent of $L$ while the precisions of all coarser sections increase with $L$. The shown bit-values are made up and serve only illustrative purposes.

Theorems & Definitions (2)

  • Lemma 1
  • Proof 1