Effect of Convolutional Depth on Image Recognition Performance: VGG vs. ResNet vs. GoogLeNet

Manfred M. Fischer; Joshua Pitts

Effect of Convolutional Depth on Image Recognition Performance: VGG vs. ResNet vs. GoogLeNet

Manfred M. Fischer, Joshua Pitts

TL;DR

It is demonstrated that effective depth, not nominal depth, is the operative quantity governing depth's role as a productive scaling dimension in convolutional networks.

Abstract

Increasing convolutional depth has been central to advances in image recognition, yet deeper networks do not uniformly yield higher accuracy, stable optimization, or efficient computation. We present a controlled comparative study of three canonical convolutional neural network architectures - VGG, ResNet, and GoogLeNet - to isolate how depth influences classification performance, convergence behavior, and computational efficiency. By standardizing training protocols and explicitly distinguishing between nominal and effective depth, we show that the benefits of depth depend critically on architectural mechanisms that constrain its effective manifestation during training rather than on nominal depth alone. Although plain deep networks exhibit early accuracy saturation and optimization instability, residual and inception-based architectures consistently translate additional depth into improved accuracy at lower effective depth and favorable accuracy-compute trade-offs. These findings demonstrate that effective depth, not nominal depth, is the operative quantity governing depth's role as a productive scaling dimension in convolutional networks.

Effect of Convolutional Depth on Image Recognition Performance: VGG vs. ResNet vs. GoogLeNet

TL;DR

It is demonstrated that effective depth, not nominal depth, is the operative quantity governing depth's role as a productive scaling dimension in convolutional networks.

Abstract

Paper Structure (31 sections, 8 equations, 7 figures, 1 table)

This paper contains 31 sections, 8 equations, 7 figures, 1 table.

Introduction
Related Work
Depth in Convolutional Neural Networks
Architectural Approaches to Depth
Comparative Studies
Architectures and Depth Definition
Network Architectures
Definition of Convolutional Depth
Experimental Setup and Design
Datasets
Computational Cost Notation
Training Protocol
Evaluation Metrics
Experimental Design
Results
...and 16 more sections

Figures (7)

Figure 1: Illustration of nominal depth versus effective depth in convolutional neural networks. While nominal depth counts the number of convolutional layers, effective depth reflects the length of information paths enabled by architectural mechanisms such as residual connections and multi-branch modules.
Figure 2: Schematic overview of the evaluated architectures. VGG employs uniform stacking of convolutional layers, ResNet introduces residual connections to facilitate gradient propagation, and GoogLeNet uses Inception modules to combine multiple receptive fields within a single layer.
Figure 3: Classification accuracy as a function of convolutional depth for VGG, ResNet, and GoogLeNet. Residual and Inception-based networks continue to benefit from increased depth, while VGG-style networks exhibit early saturation.
Figure 4: Training loss convergence across VGG, ResNet, and GoogLeNet. Architectures with residual or multi-branch connectivity converge faster and more smoothly as depth increases.
Figure 5: Optimization stability as a function of convolutional depth. Residual and Inception-based architectures maintain stable gradient norms, while deep VGG-style networks exhibit attenuation.
...and 2 more figures

Effect of Convolutional Depth on Image Recognition Performance: VGG vs. ResNet vs. GoogLeNet

TL;DR

Abstract

Effect of Convolutional Depth on Image Recognition Performance: VGG vs. ResNet vs. GoogLeNet

Authors

TL;DR

Abstract

Table of Contents

Figures (7)