Table of Contents
Fetching ...

Approximation analysis of CNNs from a feature extraction view

Jianfei Li, Han Feng, Ding-Xuan Zhou

TL;DR

The paper provides a rigorous theoretical framework showing that deep multi-channel CNNs can perform exact linear feature extraction by realizing inner products with dictionary elements through structured layers, and further demonstrates that 2D convolutions can encode singular values via suitable constructions. It delivers a comprehensive approximation analysis, proving near-optimal efficiency using depth $O(\log d)$ and parameter counts that scale favorably with the ambient or intrinsic dimension, especially for data supported on low-dimensional manifolds. The results formalize how CNNs act as dimension-reduction operators that preserve essential structure, offering insights into receptive-field growth, transfer-learning potential, and practical design choices such as stride equal to kernel size. Collectively, these findings bridge CNN architecture with classical approximation theory, providing rigorous guarantees for feature extraction and manifold-based function approximation in high-dimensional settings.

Abstract

Deep learning based on deep neural networks has been very successful in many practical applications, but it lacks enough theoretical understanding due to the network architectures and structures. In this paper we establish some analysis for linear feature extraction by a deep multi-channel convolutional neural networks (CNNs), which demonstrates the power of deep learning over traditional linear transformations, like Fourier, wavelets, redundant dictionary coding methods. Moreover, we give an exact construction presenting how linear features extraction can be conducted efficiently with multi-channel CNNs. It can be applied to lower the essential dimension for approximating a high dimensional function. Rates of function approximation by such deep networks implemented with channels and followed by fully-connected layers are investigated as well. Harmonic analysis for factorizing linear features into multi-resolution convolutions plays an essential role in our work. Nevertheless, a dedicate vectorization of matrices is constructed, which bridges 1D CNN and 2D CNN and allows us to have corresponding 2D analysis.

Approximation analysis of CNNs from a feature extraction view

TL;DR

The paper provides a rigorous theoretical framework showing that deep multi-channel CNNs can perform exact linear feature extraction by realizing inner products with dictionary elements through structured layers, and further demonstrates that 2D convolutions can encode singular values via suitable constructions. It delivers a comprehensive approximation analysis, proving near-optimal efficiency using depth and parameter counts that scale favorably with the ambient or intrinsic dimension, especially for data supported on low-dimensional manifolds. The results formalize how CNNs act as dimension-reduction operators that preserve essential structure, offering insights into receptive-field growth, transfer-learning potential, and practical design choices such as stride equal to kernel size. Collectively, these findings bridge CNN architecture with classical approximation theory, providing rigorous guarantees for feature extraction and manifold-based function approximation in high-dimensional settings.

Abstract

Deep learning based on deep neural networks has been very successful in many practical applications, but it lacks enough theoretical understanding due to the network architectures and structures. In this paper we establish some analysis for linear feature extraction by a deep multi-channel convolutional neural networks (CNNs), which demonstrates the power of deep learning over traditional linear transformations, like Fourier, wavelets, redundant dictionary coding methods. Moreover, we give an exact construction presenting how linear features extraction can be conducted efficiently with multi-channel CNNs. It can be applied to lower the essential dimension for approximating a high dimensional function. Rates of function approximation by such deep networks implemented with channels and followed by fully-connected layers are investigated as well. Harmonic analysis for factorizing linear features into multi-resolution convolutions plays an essential role in our work. Nevertheless, a dedicate vectorization of matrices is constructed, which bridges 1D CNN and 2D CNN and allows us to have corresponding 2D analysis.
Paper Structure (10 sections, 10 theorems, 73 equations, 2 figures)

This paper contains 10 sections, 10 theorems, 73 equations, 2 figures.

Key Result

Theorem 1

Let $J, m\geq 1$ be integers and $\Omega$ be a compact set of $\mathbb{R}^{2^J}$. For any dictionary of row vectors $\{v^{(\ell)}\}_{\ell=1}^m\subset \mathbb{R}^{2^J}$, there exists a multi-channel DCNN $\{h_j\}_{j=1}^J$ of depth $J$ with constant filter size $2$ and stride $2$ such that $h_J$ has $ In addition, for any $j=1,\ldots, J-1$,

Figures (2)

  • Figure 1: Visualization of mapping $T$ for a kernel sequence $(2,3,2)$. Then $d=2\cdot 3\cdot 2=12$, $\tilde{s}_1=12/2=6$, $\tilde{s}_2=12/(2\cdot 3)=2$, $\tilde{s}_3=1$. For element $Y_{9,5}$, $\lambda(9,5)=(3,6,1)$, $\delta(9,5)=1+(3-1)6^2+(6-1)2^2+(1-1)1^2=93$, $Y_{9,5} \overset{T}\longmapsto \tilde{Y}_{93}$.
  • Figure 2: Affine spaces.

Theorems & Definitions (21)

  • Definition 1
  • Definition 2
  • Theorem 1
  • Theorem 2
  • proof
  • Proposition 3
  • proof
  • Corollary 4
  • proof
  • Remark 5
  • ...and 11 more