Table of Contents
Fetching ...

Do Deep Nets Really Need to be Deep?

Lei Jimmy Ba, Rich Caruana

TL;DR

The paper investigates whether depth is truly necessary for strong performance in neural networks. By first training deep models and then teaching shallow nets to imitate their outputs through model compression and logit regression, the authors demonstrate that shallow networks can match or approach the accuracy of deep architectures on TIMIT phoneme recognition and CIFAR-10 image classification, given access to unlabeled data and a strong teacher. The results suggest that the apparent advantage of deep models may partly reflect current training procedures, and they advocate for improved algorithms to train shallow networks directly. This work highlights the potential of mimic learning as a practical route to high performance with reduced architectural depth.

Abstract

Currently, deep neural networks are the state of the art on problems such as speech recognition and computer vision. In this extended abstract, we show that shallow feed-forward networks can learn the complex functions previously learned by deep nets and achieve accuracies previously only achievable with deep models. Moreover, in some cases the shallow neural nets can learn these deep functions using a total number of parameters similar to the original deep model. We evaluate our method on the TIMIT phoneme recognition task and are able to train shallow fully-connected nets that perform similarly to complex, well-engineered, deep convolutional architectures. Our success in training shallow neural nets to mimic deeper models suggests that there probably exist better algorithms for training shallow feed-forward nets than those currently available.

Do Deep Nets Really Need to be Deep?

TL;DR

The paper investigates whether depth is truly necessary for strong performance in neural networks. By first training deep models and then teaching shallow nets to imitate their outputs through model compression and logit regression, the authors demonstrate that shallow networks can match or approach the accuracy of deep architectures on TIMIT phoneme recognition and CIFAR-10 image classification, given access to unlabeled data and a strong teacher. The results suggest that the apparent advantage of deep models may partly reflect current training procedures, and they advocate for improved algorithms to train shallow networks directly. This work highlights the potential of mimic learning as a practical route to high performance with reduced architectural depth.

Abstract

Currently, deep neural networks are the state of the art on problems such as speech recognition and computer vision. In this extended abstract, we show that shallow feed-forward networks can learn the complex functions previously learned by deep nets and achieve accuracies previously only achievable with deep models. Moreover, in some cases the shallow neural nets can learn these deep functions using a total number of parameters similar to the original deep model. We evaluate our method on the TIMIT phoneme recognition task and are able to train shallow fully-connected nets that perform similarly to complex, well-engineered, deep convolutional architectures. Our success in training shallow neural nets to mimic deeper models suggests that there probably exist better algorithms for training shallow feed-forward nets than those currently available.

Paper Structure

This paper contains 15 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Accuracy of SNNs, DNNs, and Mimic SNNs vs. # of parameters on TIMIT Dev (left) and Test (right) sets. Accuracy of the CNN and target ECNN are shown as horizontal lines for reference.
  • Figure 2: Training shallow mimic model prevents overfitting.
  • Figure 3: Accuracy of student models continues to improve as accuracy of teacher models improves.