Table of Contents
Fetching ...

Effect of Convolutional Depth on Image Recognition Performance: VGG vs. ResNet vs. GoogLeNet

Manfred M. Fischer, Joshua Pitts

TL;DR

It is demonstrated that effective depth, not nominal depth, is the operative quantity governing depth's role as a productive scaling dimension in convolutional networks.

Abstract

Increasing convolutional depth has been central to advances in image recognition, yet deeper networks do not uniformly yield higher accuracy, stable optimization, or efficient computation. We present a controlled comparative study of three canonical convolutional neural network architectures - VGG, ResNet, and GoogLeNet - to isolate how depth influences classification performance, convergence behavior, and computational efficiency. By standardizing training protocols and explicitly distinguishing between nominal and effective depth, we show that the benefits of depth depend critically on architectural mechanisms that constrain its effective manifestation during training rather than on nominal depth alone. Although plain deep networks exhibit early accuracy saturation and optimization instability, residual and inception-based architectures consistently translate additional depth into improved accuracy at lower effective depth and favorable accuracy-compute trade-offs. These findings demonstrate that effective depth, not nominal depth, is the operative quantity governing depth's role as a productive scaling dimension in convolutional networks.

Effect of Convolutional Depth on Image Recognition Performance: VGG vs. ResNet vs. GoogLeNet

TL;DR

It is demonstrated that effective depth, not nominal depth, is the operative quantity governing depth's role as a productive scaling dimension in convolutional networks.

Abstract

Increasing convolutional depth has been central to advances in image recognition, yet deeper networks do not uniformly yield higher accuracy, stable optimization, or efficient computation. We present a controlled comparative study of three canonical convolutional neural network architectures - VGG, ResNet, and GoogLeNet - to isolate how depth influences classification performance, convergence behavior, and computational efficiency. By standardizing training protocols and explicitly distinguishing between nominal and effective depth, we show that the benefits of depth depend critically on architectural mechanisms that constrain its effective manifestation during training rather than on nominal depth alone. Although plain deep networks exhibit early accuracy saturation and optimization instability, residual and inception-based architectures consistently translate additional depth into improved accuracy at lower effective depth and favorable accuracy-compute trade-offs. These findings demonstrate that effective depth, not nominal depth, is the operative quantity governing depth's role as a productive scaling dimension in convolutional networks.
Paper Structure (31 sections, 8 equations, 7 figures, 1 table)

This paper contains 31 sections, 8 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Illustration of nominal depth versus effective depth in convolutional neural networks. While nominal depth counts the number of convolutional layers, effective depth reflects the length of information paths enabled by architectural mechanisms such as residual connections and multi-branch modules.
  • Figure 2: Schematic overview of the evaluated architectures. VGG employs uniform stacking of convolutional layers, ResNet introduces residual connections to facilitate gradient propagation, and GoogLeNet uses Inception modules to combine multiple receptive fields within a single layer.
  • Figure 3: Classification accuracy as a function of convolutional depth for VGG, ResNet, and GoogLeNet. Residual and Inception-based networks continue to benefit from increased depth, while VGG-style networks exhibit early saturation.
  • Figure 4: Training loss convergence across VGG, ResNet, and GoogLeNet. Architectures with residual or multi-branch connectivity converge faster and more smoothly as depth increases.
  • Figure 5: Optimization stability as a function of convolutional depth. Residual and Inception-based architectures maintain stable gradient norms, while deep VGG-style networks exhibit attenuation.
  • ...and 2 more figures