Neural Multigrid Architectures

Vladimir Fanaskov

Neural Multigrid Architectures

Vladimir Fanaskov

TL;DR

The paper develops a matrix-free, convolution-based neural multigrid framework that maps geometric multigrid solvers onto trainable neural architectures. By targeting the spectral-radius of the error-propagation operator and employing layer serialization to scale with grid refinement, the approach yields robust solvers for several 2D Poisson discretizations, notably when using learned smoothing on serialized or multi-layer designs. The study finds that architectures with jointly learned restriction and smoothing (e.g., s$1$MG(s), s$3$MG(s)) can outperform a baseline multigrid, while fixed-size architectures (U-Net, fMG) struggle on larger grids; naive serialization may fail for some configurations. The work highlights the potential and challenges of integrating neural components into scalable, matrix-free multigrid solvers and points to future directions in loss design and pretrained-layer assembly to enhance robustness across grid refinements.

Abstract

We propose a convenient matrix-free neural architecture for the multigrid method. The architecture is simple enough to be implemented in less than fifty lines of code, yet it encompasses a large number of distinct multigrid solvers. We argue that a fixed neural network without dense layers can not realize an efficient iterative method. Because of that, standard training protocols do not lead to competitive solvers. To overcome this difficulty, we use parameter sharing and serialization of layers. The resulting network can be trained on linear problems with thousands of unknowns and retains its efficiency on problems with millions of unknowns. From the point of view of numerical linear algebra network's training corresponds to finding optimal smoothers for the geometric multigrid method. We demonstrate our approach on a few second-order elliptic equations. For tested linear systems, we obtain from two to five times smaller spectral radius of the error propagation matrix compare to a basic linear multigrid with Jacobi smoother.

Neural Multigrid Architectures

TL;DR

MG(s), s

MG(s)) can outperform a baseline multigrid, while fixed-size architectures (U-Net, fMG) struggle on larger grids; naive serialization may fail for some configurations. The work highlights the potential and challenges of integrating neural components into scalable, matrix-free multigrid solvers and points to future directions in loss design and pretrained-layer assembly to enhance robustness across grid refinements.

Abstract

Paper Structure (24 sections, 2 theorems, 22 equations, 4 figures, 7 tables, 1 algorithm)

This paper contains 24 sections, 2 theorems, 22 equations, 4 figures, 7 tables, 1 algorithm.

Introduction
Multigrid method
Matrix-free multigrid architecture
Loss function
Restriction on architecture for linear iterative methods
Architectures and a baseline solver
LMG
s$1$MG(rs)
s$1$MG(s)
s$3$MG(s)
U-Net
fMG
Experiments
Model equations
Poisson equation
...and 9 more sections

Key Result

Proposition 1

Let $A_{H}$ be a matrix of linear problem nmg:eq:Laplace_point_source obtained using finite element method on a given grid with spacing $\simeq H$. Let $\mathcal{N}$ be a neural network, that consists on finite number of (local) convolutional layersWe exclude nonlocal convolutions based on graph Lap

Figures (4)

Figure 1: Because $A_{i+1} = P_{i}A_{i}P_{i}^{T}$, product $A_{3} x_{3}$ can be computed as a set of convolutions (dashed lines), and transposed convolutions (solid lines), and an application of operator on the fine grid (double line); $w_{i}$ corresponds to convolution kernels.
Figure 2: When delta-function is presented as a right-hand side of a continuous problem, the weak form results in a sparse right-hand side because only a small number of (shaded) tent functions feels the presence of the source (denoted by a point).
Figure 3: The figure shows how the receptive field of fixed architecture with only local layers (shaded) changes after refinement. Since convolutions are performed on discrete data, information from the dot in the middle can spread over the smaller region (enclosed by the circle) in physical space. This limits the ability to generalize for a neural network with fixed architecture.
Figure 4: U-Net architecture with two layers. Convolutions with all strides equal $1$ are denoted by double lines (they correspond to smoothing in multigrid architecture), the single line represents convolution with at least one stride $>1$, dashed line is a transpose to this operation, a curved line is a skip connection (copy and add).

Theorems & Definitions (8)

Proposition
proof
Remark 1
Remark 2
Corollary
proof
Remark 3
Remark 4

Neural Multigrid Architectures

TL;DR

Abstract

Neural Multigrid Architectures

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (8)