Learning of deep convolutional network image classifiers via stochastic gradient descent and over-parametrization
Michael Kohler, Adam Krzyzak, Alisha Sänger
TL;DR
This work analyzes image classification with deep CNNs trained by stochastic gradient descent in an over-parameterized regime. By formulating the estimator as a linear combination of truncated CNNs and applying SGD with a projection step, the authors derive excess-risk bounds that can be dimension-free under a hierarchical max-pooling model for the a posteriori probability. The core contribution is a general bound on the logistic risk for SGD-learned over-parameterized CNN ensembles, plus specialized rates for hierarchical models, including an improved rate under margin-type conditions. The results provide theoretical justification for dimension-independent learning performance on large-scale image datasets, connecting optimization dynamics, approximation power, and generalization through a unified framework. The work extends prior analyses to stochastic optimization and max-pooling hierarchies, with implications for understanding why gradient-based training of deep CNNs can generalize well in high-dimensional image spaces.
Abstract
Image classification from independent and identically distributed random variables is considered. Image classifiers are defined which are based on a linear combination of deep convolutional networks with max-pooling layer. Here all the weights are learned by stochastic gradient descent. A general result is presented which shows that the image classifiers are able to approximate the best possible deep convolutional network. In case that the a posteriori probability satisfies a suitable hierarchical composition model it is shown that the corresponding deep convolutional neural network image classifier achieves a rate of convergence which is independent of the dimension of the images.
