Table of Contents
Fetching ...

Learning Time/Memory-Efficient Deep Architectures with Budgeted Super Networks

Tom Veniat, Ludovic Denoyer

TL;DR

This work tackles the challenge of finding neural network architectures that are both accurate and cost-efficient by introducing Budgeted Super Networks (BSN). BSN leverages a large super-network (S-network) and a budgeted objective, extended through Stochastic Super Networks (SS-networks) and edge-sampling, to search over architectures under arbitrary cost constraints. Empirical results on CIFAR-10/100 and a segmentation task show that BSN can outperform ResNet and Convolutional Neural Fabrics baselines at equivalent or lower computation and memory costs, including in distributed settings. The framework provides a generic, end-to-end approach for discovering cost-aware architectures, with potential extensions to reduce training time via meta-learning and to other cost models beyond computation and memory.

Abstract

We propose to focus on the problem of discovering neural network architectures efficient in terms of both prediction quality and cost. For instance, our approach is able to solve the following tasks: learn a neural network able to predict well in less than 100 milliseconds or learn an efficient model that fits in a 50 Mb memory. Our contribution is a novel family of models called Budgeted Super Networks (BSN). They are learned using gradient descent techniques applied on a budgeted learning objective function which integrates a maximum authorized cost, while making no assumption on the nature of this cost. We present a set of experiments on computer vision problems and analyze the ability of our technique to deal with three different costs: the computation cost, the memory consumption cost and a distributed computation cost. We particularly show that our model can discover neural network architectures that have a better accuracy than the ResNet and Convolutional Neural Fabrics architectures on CIFAR-10 and CIFAR-100, at a lower cost.

Learning Time/Memory-Efficient Deep Architectures with Budgeted Super Networks

TL;DR

This work tackles the challenge of finding neural network architectures that are both accurate and cost-efficient by introducing Budgeted Super Networks (BSN). BSN leverages a large super-network (S-network) and a budgeted objective, extended through Stochastic Super Networks (SS-networks) and edge-sampling, to search over architectures under arbitrary cost constraints. Empirical results on CIFAR-10/100 and a segmentation task show that BSN can outperform ResNet and Convolutional Neural Fabrics baselines at equivalent or lower computation and memory costs, including in distributed settings. The framework provides a generic, end-to-end approach for discovering cost-aware architectures, with potential extensions to reduce training time via meta-learning and to other cost models beyond computation and memory.

Abstract

We propose to focus on the problem of discovering neural network architectures efficient in terms of both prediction quality and cost. For instance, our approach is able to solve the following tasks: learn a neural network able to predict well in less than 100 milliseconds or learn an efficient model that fits in a 50 Mb memory. Our contribution is a novel family of models called Budgeted Super Networks (BSN). They are learned using gradient descent techniques applied on a budgeted learning objective function which integrates a maximum authorized cost, while making no assumption on the nature of this cost. We present a set of experiments on computer vision problems and analyze the ability of our technique to deal with three different costs: the computation cost, the memory consumption cost and a distributed computation cost. We particularly show that our model can discover neural network architectures that have a better accuracy than the ResNet and Convolutional Neural Fabrics architectures on CIFAR-10 and CIFAR-100, at a lower cost.

Paper Structure

This paper contains 19 sections, 1 theorem, 18 equations, 8 figures, 9 tables, 1 algorithm.

Key Result

Proposition 1

(proof in Appendix) When the solution of Equation stochobjective is reached, then the models sampled following $(\Gamma^*)$ and using parameters $\theta^*$ are optimal solution of the problem of Equation objective.

Figures (8)

  • Figure 1: This figure illustrates the two Super Networks on top of which cost-constrained architectures will be discovered. The ResNet Fabric is a generalization of ResNetsDBLP:journals/corr/HeZRS15, while CNF has been proposed in DBLP:journals/corr/SaxenaV16. In both cases, our objective is to discover architectures that are efficient in both prediction quality and cost, by sampling edges over these S-networks.
  • Figure 2: Accuracy/Time trade-off using B-ResNet on CIFAR-10.
  • Figure 3: Discovered architectures: (Left) is a low computation cost B-ResNet where dashed edges correspond to connections in which the two convolution layers have been removed (only shortcut or projection connections are kept). (Center) is a low computation cost B-CNF where high-resolution operations have been removed. (Right) is a low memory consumption cost B-CNF: the algorithm has mostly kept all high resolution convolutions since they allow fine-grained feature maps and have the same number of parameters than lower-resolution convolutions. It is interesting to note that our algorithm, constrained with two different costs, automatically learned two different efficient architectures.
  • Figure 4: Evolution of the loss function and the entropy of $\Gamma$ during training. The period between epoch 0 and 50 is the burn-in phase. The learning rate is divided by 10 after epoch 150 to increase the convergence speed.
  • Figure 5: Architectures discovered on CIFAR-10 for different number of distributed cores $n$.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Proposition 1