Table of Contents
Fetching ...

Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks

Zixuan Zhang, Kaiqi Zhang, Minshuo Chen, Yuma Takeda, Mengdi Wang, Tuo Zhao, Yu-Xiang Wang

TL;DR

This work analyzes overparameterized ConvResNeXts with weight decay for nonparametric binary classification on data supported near a low-dimensional manifold. It develops approximation and estimation theory for Besov-smooth targets on manifolds, showing strong adaptivity to local smoothness and intrinsic dimension while avoiding the curse of dimensionality; deeper networks can approach minimax rates, and architectural flexibility with parallel paths is theoretically justified. The key contributions include a Besov-based universal approximation bound, a Dudley-based covering-number analysis to bound the local Gaussian complexity, and near-minimax risk rates achieved by balancing depth and width in an overparameterized regime. The results offer theoretical justification for the empirical success of deep, overparameterized ConvResNets/ResNeXts and provide architectural insights into depth-width tradeoffs and sparsity induced by weight decay.

Abstract

Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkable prediction performance in practice, which cannot be well explained by conventional wisdom. To bridge this gap, we study the performance of ConvResNeXts, which cover ConvResNets as a special case, trained with weight decay from the perspective of nonparametric classification. Our analysis allows for infinitely many building blocks in ConvResNeXts, and shows that weight decay implicitly enforces sparsity on these blocks. Specifically, we consider a smooth target function supported on a low-dimensional manifold, then prove that ConvResNeXts can adapt to the function smoothness and low-dimensional structures and efficiently learn the function without suffering from the curse of dimensionality. Our findings partially justify the advantage of overparameterized ConvResNeXts over conventional machine learning models.

Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks

TL;DR

This work analyzes overparameterized ConvResNeXts with weight decay for nonparametric binary classification on data supported near a low-dimensional manifold. It develops approximation and estimation theory for Besov-smooth targets on manifolds, showing strong adaptivity to local smoothness and intrinsic dimension while avoiding the curse of dimensionality; deeper networks can approach minimax rates, and architectural flexibility with parallel paths is theoretically justified. The key contributions include a Besov-based universal approximation bound, a Dudley-based covering-number analysis to bound the local Gaussian complexity, and near-minimax risk rates achieved by balancing depth and width in an overparameterized regime. The results offer theoretical justification for the empirical success of deep, overparameterized ConvResNets/ResNeXts and provide architectural insights into depth-width tradeoffs and sparsity induced by weight decay.

Abstract

Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkable prediction performance in practice, which cannot be well explained by conventional wisdom. To bridge this gap, we study the performance of ConvResNeXts, which cover ConvResNets as a special case, trained with weight decay from the perspective of nonparametric classification. Our analysis allows for infinitely many building blocks in ConvResNeXts, and shows that weight decay implicitly enforces sparsity on these blocks. Specifically, we consider a smooth target function supported on a low-dimensional manifold, then prove that ConvResNeXts can adapt to the function smoothness and low-dimensional structures and efficiently learn the function without suffering from the curse of dimensionality. Our findings partially justify the advantage of overparameterized ConvResNeXts over conventional machine learning models.
Paper Structure (27 sections, 24 theorems, 83 equations, 5 figures)

This paper contains 27 sections, 24 theorems, 83 equations, 5 figures.

Key Result

Proposition 1

Let $\{U_i\}_{i \in \mathcal{A}}$ be a locally finite cover of a smooth manifold $\mathcal{M}$. Then there is a $C^\infty$ partition of unity $\{\rho_i\}_{i=1}^\infty$ where every $\rho_i$ has a compact support such that $\mathrm{supp}(\rho_i) \subset U_i$.

Figures (5)

  • Figure 1: (a) Demonstration of the convolution operation $\mathcal{W} *z$, where the input is $z\in \mathbb{R}^{D\times w}$, and the output is $\mathcal{W}* z \in \mathbb{R}^{D\times w'}$. Here $\mathcal{W}_{j,:,:}$ is a $D \times w$ matrix for the $j$-th output channel. (b) Demonstration of the ConvResNeXt. $f_{1, 1} \dots f_{N,M}$ are the building blocks, each building block is a convolution neural network.
  • Figure 2: Illustration of a Besov function on $1$-dimensional manifold embedded in a $3$-dimensional ambient space.
  • Figure 3: MSE as a function of the effective degree of freedom (dof) of different methods.
  • Figure 4: MSE as a function of dimension $D$.
  • Figure 5: MSE as function of sample size $n$.

Theorems & Definitions (35)

  • Definition 1: Chart
  • Definition 2: $C^k$ Atlas
  • Definition 3: Smooth Manifold
  • Definition 4: $C^s$ functions on $\mathcal{M}$
  • Definition 5: Partition of Unity, Definition 13.4 in tu2011manifolds
  • Proposition 1: Existence of a $C^\infty$ partition of unity, Theorem 13.7 in tu2011manifolds
  • Definition 6: Reach federer1959curvatureniyogi2008finding
  • Definition 7: Modulus of Smoothness devore1993constructivesuzuki2018adaptivity
  • Definition 8: Besov Space $B_{p,q}^\alpha(\Omega)$
  • Proposition 2: Decomposition of Besov functions
  • ...and 25 more