Table of Contents
Fetching ...

ResNet: Enabling Deep Convolutional Neural Networks through Residual Learning

Xingyu Liu, Kun Ming Goh

TL;DR

The paper tackles the difficulty of training very deep CNNs due to vanishing gradients and degradation. It introduces residual learning with skip connections, formalizing a residual function $F(x)=H(x)-x$ so that $H(x)=F(x)+x$, to enable direct gradient flow and easier optimization. Empirically, ResNet-18 configured for CIFAR-10 achieves 89.9% Top-1 accuracy, outperforming a comparable Baseline CNN by 5.8 percentage points and converging more quickly and stably, demonstrating the effectiveness of deep residual architectures. The work also discusses extensions like ResNeXt, DenseNet, and Wide ResNet, and provides ablation analyses showing that skip connections are essential for gradient propagation and performance gains, thereby establishing residual learning as a scalable paradigm for deep computer vision models.

Abstract

Convolutional Neural Networks (CNNs) has revolutionized computer vision, but training very deep networks has been challenging due to the vanishing gradient problem. This paper explores Residual Networks (ResNet), introduced by He et al. (2015), which overcomes this limitation by using skip connections. ResNet enables the training of networks with hundreds of layers by allowing gradients to flow directly through shortcut connections that bypass intermediate layers. In our implementation on the CIFAR-10 dataset, ResNet-18 achieves 89.9% accuracy compared to 84.1% for a traditional deep CNN of similar depth, while also converging faster and training more stably.

ResNet: Enabling Deep Convolutional Neural Networks through Residual Learning

TL;DR

The paper tackles the difficulty of training very deep CNNs due to vanishing gradients and degradation. It introduces residual learning with skip connections, formalizing a residual function so that , to enable direct gradient flow and easier optimization. Empirically, ResNet-18 configured for CIFAR-10 achieves 89.9% Top-1 accuracy, outperforming a comparable Baseline CNN by 5.8 percentage points and converging more quickly and stably, demonstrating the effectiveness of deep residual architectures. The work also discusses extensions like ResNeXt, DenseNet, and Wide ResNet, and provides ablation analyses showing that skip connections are essential for gradient propagation and performance gains, thereby establishing residual learning as a scalable paradigm for deep computer vision models.

Abstract

Convolutional Neural Networks (CNNs) has revolutionized computer vision, but training very deep networks has been challenging due to the vanishing gradient problem. This paper explores Residual Networks (ResNet), introduced by He et al. (2015), which overcomes this limitation by using skip connections. ResNet enables the training of networks with hundreds of layers by allowing gradients to flow directly through shortcut connections that bypass intermediate layers. In our implementation on the CIFAR-10 dataset, ResNet-18 achieves 89.9% accuracy compared to 84.1% for a traditional deep CNN of similar depth, while also converging faster and training more stably.

Paper Structure

This paper contains 23 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Training accuracy and loss trends for all models. The ResNet-18 model (green) achieves faster convergence and lower training loss, indicating improved gradient flow and stable optimisation compared to the baseline CNN.
  • Figure 2: Validation accuracy and loss trends for all models. The ResNet-18 (green) converges faster and achieves the highest validation accuracy with the lowest validation loss, demonstrating the benefit of residual connections.
  • Figure 3: Gradient magnitude distribution across layers for different models. The ResNet-18 with skip connections maintains stable gradient flow throughout the network, while the baseline CNN and non-residual variant experience severe vanishing gradients in earlier layers.
  • Figure 4: Training accuracy and loss across all models. The ResNet-18 with skip connections converges faster and maintains lower loss throughout training, while the no-skip variant suffers from slower convergence and higher residual error.
  • Figure 5: Validation accuracy and loss across all models, including ResNet-18 variants. The ResNet-18 with skip connections achieves the highest validation accuracy and lowest loss, confirming the effectiveness of residual learning compared to the no-skip variant and other baselines.