Table of Contents
Fetching ...

Learning Morphisms with Gauss-Newton Approximation for Growing Networks

Neal Lawton, Aram Galstyan, Greg Ver Steeg

TL;DR

This work proposes a NAS method for growing a network by using a Gauss-Newton approximation of the loss function to efficiently learn and evaluate candidate network morphisms and concludes this method learns similar quality or better architectures at a smaller computational cost.

Abstract

A popular method for Neural Architecture Search (NAS) is based on growing networks via small local changes to the network's architecture called network morphisms. These methods start with a small seed network and progressively grow the network by adding new neurons in an automated way. However, it remains a challenge to efficiently determine which parts of the network are best to grow. Here we propose a NAS method for growing a network by using a Gauss-Newton approximation of the loss function to efficiently learn and evaluate candidate network morphisms. We compare our method with state of the art NAS methods for CIFAR-10 and CIFAR-100 classification tasks, and conclude our method learns similar quality or better architectures at a smaller computational cost.

Learning Morphisms with Gauss-Newton Approximation for Growing Networks

TL;DR

This work proposes a NAS method for growing a network by using a Gauss-Newton approximation of the loss function to efficiently learn and evaluate candidate network morphisms and concludes this method learns similar quality or better architectures at a smaller computational cost.

Abstract

A popular method for Neural Architecture Search (NAS) is based on growing networks via small local changes to the network's architecture called network morphisms. These methods start with a small seed network and progressively grow the network by adding new neurons in an automated way. However, it remains a challenge to efficiently determine which parts of the network are best to grow. Here we propose a NAS method for growing a network by using a Gauss-Newton approximation of the loss function to efficiently learn and evaluate candidate network morphisms. We compare our method with state of the art NAS methods for CIFAR-10 and CIFAR-100 classification tasks, and conclude our method learns similar quality or better architectures at a smaller computational cost.

Paper Structure

This paper contains 12 sections, 2 theorems, 16 equations, 4 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

If ${\Delta z = \lambda \Delta z^*}$ for some $\lambda \in \mathbb R$ and some $\Delta z^*$ satisfying $A (z + \Delta z^*) = b$, then

Figures (4)

  • Figure 1: Network morphisms. Square nodes represent convolutional layers, circular nodes represent convolutional channels.
  • Figure 2: Network grown from a VGG-19 seed network by our algorithm for classifying CIFAR-100. Here the network is at the end of its $15$-th growth phase. Channels colored red will be split with their learned channel-splitting morphism parameters in the next epoch; splitting the reddest channels is estimated to give the highest loss-resource tradeoff. Channels colred blue will be pruned in the next epoch; pruning the bluest channels is estimated to give the highest loss-resource tradeoff.
  • Figure 3: Estimated versus actual decrease in loss for morphisms learned while holding model parameters constant.
  • Figure 4: Comparison of different morphism learning strategies. Each $3$-bar cluster plots the true decrease in loss achieved by the channel-splitting morphism learned by each method for one of the $64$ channels in the first layer of VGG-19.

Theorems & Definitions (4)

  • Theorem 1: General Gauss-Newton Approximation
  • proof
  • Theorem 2: Rank-1 Gauss-Newton Approximation
  • proof