Approximation analysis of CNNs from a feature extraction view
Jianfei Li, Han Feng, Ding-Xuan Zhou
TL;DR
The paper provides a rigorous theoretical framework showing that deep multi-channel CNNs can perform exact linear feature extraction by realizing inner products with dictionary elements through structured layers, and further demonstrates that 2D convolutions can encode singular values via suitable constructions. It delivers a comprehensive approximation analysis, proving near-optimal efficiency using depth $O(\log d)$ and parameter counts that scale favorably with the ambient or intrinsic dimension, especially for data supported on low-dimensional manifolds. The results formalize how CNNs act as dimension-reduction operators that preserve essential structure, offering insights into receptive-field growth, transfer-learning potential, and practical design choices such as stride equal to kernel size. Collectively, these findings bridge CNN architecture with classical approximation theory, providing rigorous guarantees for feature extraction and manifold-based function approximation in high-dimensional settings.
Abstract
Deep learning based on deep neural networks has been very successful in many practical applications, but it lacks enough theoretical understanding due to the network architectures and structures. In this paper we establish some analysis for linear feature extraction by a deep multi-channel convolutional neural networks (CNNs), which demonstrates the power of deep learning over traditional linear transformations, like Fourier, wavelets, redundant dictionary coding methods. Moreover, we give an exact construction presenting how linear features extraction can be conducted efficiently with multi-channel CNNs. It can be applied to lower the essential dimension for approximating a high dimensional function. Rates of function approximation by such deep networks implemented with channels and followed by fully-connected layers are investigated as well. Harmonic analysis for factorizing linear features into multi-resolution convolutions plays an essential role in our work. Nevertheless, a dedicate vectorization of matrices is constructed, which bridges 1D CNN and 2D CNN and allows us to have corresponding 2D analysis.
