Table of Contents
Fetching ...

Multi-Residual Networks: Improving the Speed and Accuracy of Residual Networks

Masoud Abdi, Saeid Nahavandi

TL;DR

A new convolutional neural network architecture is proposed which builds upon the success of residual networks by explicitly exploiting the interpretation of very deep networks as an ensemble, and generates models that are wider, rather than deeper, which significantly improves accuracy.

Abstract

In this article, we take one step toward understanding the learning behavior of deep residual networks, and supporting the observation that deep residual networks behave like ensembles. We propose a new convolutional neural network architecture which builds upon the success of residual networks by explicitly exploiting the interpretation of very deep networks as an ensemble. The proposed multi-residual network increases the number of residual functions in the residual blocks. Our architecture generates models that are wider, rather than deeper, which significantly improves accuracy. We show that our model achieves an error rate of 3.73% and 19.45% on CIFAR-10 and CIFAR-100 respectively, that outperforms almost all of the existing models. We also demonstrate that our model outperforms very deep residual networks by 0.22% (top-1 error) on the full ImageNet 2012 classification dataset. Additionally, inspired by the parallel structure of multi-residual networks, a model parallelism technique has been investigated. The model parallelism method distributes the computation of residual blocks among the processors, yielding up to 15% computational complexity improvement.

Multi-Residual Networks: Improving the Speed and Accuracy of Residual Networks

TL;DR

A new convolutional neural network architecture is proposed which builds upon the success of residual networks by explicitly exploiting the interpretation of very deep networks as an ensemble, and generates models that are wider, rather than deeper, which significantly improves accuracy.

Abstract

In this article, we take one step toward understanding the learning behavior of deep residual networks, and supporting the observation that deep residual networks behave like ensembles. We propose a new convolutional neural network architecture which builds upon the success of residual networks by explicitly exploiting the interpretation of very deep networks as an ensemble. The proposed multi-residual network increases the number of residual functions in the residual blocks. Our architecture generates models that are wider, rather than deeper, which significantly improves accuracy. We show that our model achieves an error rate of 3.73% and 19.45% on CIFAR-10 and CIFAR-100 respectively, that outperforms almost all of the existing models. We also demonstrate that our model outperforms very deep residual networks by 0.22% (top-1 error) on the full ImageNet 2012 classification dataset. Additionally, inspired by the parallel structure of multi-residual networks, a model parallelism technique has been investigated. The model parallelism method distributes the computation of residual blocks among the processors, yielding up to 15% computational complexity improvement.

Paper Structure

This paper contains 13 sections, 4 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: (a) A residual network; (b) Deleting $f_2$ from a residual network veit2016residual. It can be seen that residual networks have $2^n$ paths connecting the input to the output. Deleting a block from the residual network reduces the number of paths to $2^{n-1}$.
  • Figure 2: A residual block (left) versus a multi-residual block (right).
  • Figure 3: Comparing residual network and the proposed multi-residual network on CIFAR-10 test set to show the effective range phenomena. Each curve is mean over 5 runs. (a) This is the situation that the network depth $<n_0$ in which the multi-residual network performs worse than the original residual network; (b) Both networks have a comparable performance; (c) The proposed multi-residual network outperforms the original residual network.
  • Figure 4: Model parallelization of a multi-residual block with four residual functions on two GPUs.