Table of Contents
Fetching ...

Poly-MgNet: Polynomial Building Blocks in Multigrid-Inspired ResNets

Antonia van Betteray, Matthias Rottmann, Karsten Kahl

TL;DR

The paper tackles the high parameter count of ResNets by embedding MG-inspired polynomial smoothers into MgNet to create Poly-MgNet, dramatically reducing weights while preserving accuracy. It replaces or augments standard smoothing with a polynomial operator $p_d(A)=\sum_{i=0}^d \alpha_i A^i$, leading to residual updates $u \leftarrow u + p_d(A)(f-Au)$ and a residual propagation factor $q_{d+1}(A)=I-Ap_d(A)$, whose roots are chosen from the spectrum to control convergence. Empirically, Poly-MgNet achieves strong accuracy with substantially fewer parameters on CIFAR-10 (e.g., Poly-MgNet$^{q_2}$ around $1.3$M weights vs ResNet18's $11.2$M) and demonstrates that real- and complex-root based polynomial blocks can further improve the accuracy–weight trade-off relative to ResNet and MgNet baselines. The work provides design guidelines for integrating MG smoothing into CNNs, including root selection, ReLU/bn placement, and channel scaling, illustrating that MG-inspired weight-sharing can yield competitive performance with much smaller models.

Abstract

The structural analogies of ResNets and Multigrid (MG) methods such as common building blocks like convolutions and poolings where already pointed out by He et al.\ in 2016. Multigrid methods are used in the context of scientific computing for solving large sparse linear systems arising from partial differential equations. MG methods particularly rely on two main concepts: smoothing and residual restriction / coarsening. Exploiting these analogies, He and Xu developed the MgNet framework, which integrates MG schemes into the design of ResNets. In this work, we introduce a novel neural network building block inspired by polynomial smoothers from MG theory. Our polynomial block from an MG perspective naturally extends the MgNet framework to Poly-Mgnet and at the same time reduces the number of weights in MgNet. We present a comprehensive study of our polynomial block, analyzing the choice of initial coefficients, the polynomial degree, the placement of activation functions, as well as of batch normalizations. Our results demonstrate that constructing (quadratic) polynomial building blocks based on real and imaginary polynomial roots enhances Poly-MgNet's capacity in terms of accuracy. Furthermore, our approach achieves an improved trade-off of model accuracy and number of weights compared to ResNet as well as compared to specific configurations of MgNet.

Poly-MgNet: Polynomial Building Blocks in Multigrid-Inspired ResNets

TL;DR

The paper tackles the high parameter count of ResNets by embedding MG-inspired polynomial smoothers into MgNet to create Poly-MgNet, dramatically reducing weights while preserving accuracy. It replaces or augments standard smoothing with a polynomial operator , leading to residual updates and a residual propagation factor , whose roots are chosen from the spectrum to control convergence. Empirically, Poly-MgNet achieves strong accuracy with substantially fewer parameters on CIFAR-10 (e.g., Poly-MgNet around M weights vs ResNet18's M) and demonstrates that real- and complex-root based polynomial blocks can further improve the accuracy–weight trade-off relative to ResNet and MgNet baselines. The work provides design guidelines for integrating MG smoothing into CNNs, including root selection, ReLU/bn placement, and channel scaling, illustrating that MG-inspired weight-sharing can yield competitive performance with much smaller models.

Abstract

The structural analogies of ResNets and Multigrid (MG) methods such as common building blocks like convolutions and poolings where already pointed out by He et al.\ in 2016. Multigrid methods are used in the context of scientific computing for solving large sparse linear systems arising from partial differential equations. MG methods particularly rely on two main concepts: smoothing and residual restriction / coarsening. Exploiting these analogies, He and Xu developed the MgNet framework, which integrates MG schemes into the design of ResNets. In this work, we introduce a novel neural network building block inspired by polynomial smoothers from MG theory. Our polynomial block from an MG perspective naturally extends the MgNet framework to Poly-Mgnet and at the same time reduces the number of weights in MgNet. We present a comprehensive study of our polynomial block, analyzing the choice of initial coefficients, the polynomial degree, the placement of activation functions, as well as of batch normalizations. Our results demonstrate that constructing (quadratic) polynomial building blocks based on real and imaginary polynomial roots enhances Poly-MgNet's capacity in terms of accuracy. Furthermore, our approach achieves an improved trade-off of model accuracy and number of weights compared to ResNet as well as compared to specific configurations of MgNet.

Paper Structure

This paper contains 18 sections, 23 equations, 5 figures, 4 tables, 3 algorithms.

Figures (5)

  • Figure 1: Weight sharing in ResNet and MgNet; (\ref{['subfig:resnet']}) ResNet-blocks, no weight sharing; (\ref{['subfig:mgneti']}) MgNet-blocks, shared $A_\ell$; (\ref{['subfig:mgnet']}) MgNet-blocks, shared layers $A_\ell$ and $B_\ell$ and Poly-MgNet (\ref{['subfig:mgnetp']}).
  • Figure 2: Data-feature relations $A_\ell$ and $B_\ell$ on resolution level $\ell$ followed by transfer to coarser resolution $\ell+1$. $A_\ell$ applied to the features $u_\ell$, calculation of the residual $r_\ell = f_\ell - A_\ell u_\ell$, on which the feature extractor $B_\ell$ is applied.
  • Figure 3: Surface representation of the spectrum $\Lambda$ for the corresponding matrix $A \in \mathbb{R}^{64 \times 64 \times 3 \times 3}$. The $x$-axis represents the real parts, while $y$-axis corresponds to the imaginary part. The $z$-axis illustrates the amplitude of the polynomial function of $\operatorname{abs}(q_4(\Lambda))$, which has roots at the eigenvalues with minimal and maximal real parts, as well as the and complex conjugated pair of eigenvalues with the maximal imaginary part. This visualization allows for an intuitive identification of the spectrum's maximal amplitude.
  • Figure 4: Schematic illustration of polynomials $q_2$, (two blocks), with different choices of roots from an exemplary (real) spectrum.
  • Figure 5: Accuracy-weight trade-off of ResNet and MgNet models: influence of overall capacity of residual networks on classification accuracy. The number of channels in the residual blocks of ResNet and MgNet$^{^{\rm A, B}}$ are scaled by $1/\sqrt{2}$ and $1/\sqrt{8}$. In contrast, the channels of Poly-MgNet$^{q_d}$ are rescaled by multiplying by $\sqrt{2}$ and $\sqrt{8}$.