Generalisable Agents for Neural Network Optimisation

Kale-ab Tessera; Callum Rhys Tilbury; Sasha Abramowitz; Ruan de Kock; Omayma Mahjoub; Benjamin Rosman; Sara Hooker; Arnu Pretorius

Generalisable Agents for Neural Network Optimisation

Kale-ab Tessera, Callum Rhys Tilbury, Sasha Abramowitz, Ruan de Kock, Omayma Mahjoub, Benjamin Rosman, Sara Hooker, Arnu Pretorius

TL;DR

This paper uses GANNO to control the layerwise learning rate and shows that the framework can yield useful and responsive schedules that are competitive with handcrafted heuristics.

Abstract

Optimising deep neural networks is a challenging task due to complex training dynamics, high computational requirements, and long training times. To address this difficulty, we propose the framework of Generalisable Agents for Neural Network Optimisation (GANNO) -- a multi-agent reinforcement learning (MARL) approach that learns to improve neural network optimisation by dynamically and responsively scheduling hyperparameters during training. GANNO utilises an agent per layer that observes localised network dynamics and accordingly takes actions to adjust these dynamics at a layerwise level to collectively improve global performance. In this paper, we use GANNO to control the layerwise learning rate and show that the framework can yield useful and responsive schedules that are competitive with handcrafted heuristics. Furthermore, GANNO is shown to perform robustly across a wide variety of unseen initial conditions, and can successfully generalise to harder problems than it was trained on. Our work presents an overview of the opportunities that this paradigm offers for training neural networks, along with key challenges that remain to be overcome.

Generalisable Agents for Neural Network Optimisation

TL;DR

This paper uses GANNO to control the layerwise learning rate and shows that the framework can yield useful and responsive schedules that are competitive with handcrafted heuristics.

Abstract

Paper Structure (10 sections, 7 figures, 5 tables)

This paper contains 10 sections, 7 figures, 5 tables.

Introduction
Background
Related Work
Methodology
Results
Challenges, Opportunities, and Future Work
Conclusion
Hyperparameters
Manual Schedules
Extended Results

Figures (7)

Figure 1: GANNO's training process. There is an agent per layer of a neural network. Each agent receives a set of global and layer-specific observations about the environment and uses this information to select an action, which is applied to a corresponding layer. Then, training in the environment progresses for some time, after which a reward signal is returned and this loop continues.
Figure 2: GANNO's dynamic learning rate and corresponding training loss onFashion-MNIST, shown at episodes 0 and 40. The first episode of MARL training is shown in orange and in later training, at episode 40, in blue. Both training and evaluation use a two-layered CNN. We observe clear evidence of a useful schedule being learned, which improves the classification loss.
Figure 3: GANNO's learning rate schedule dynamically escaping local optima during training for a two-layer CNN onFashion-MNIST. In both layers at key moments (around $14\,000$ and $29\,000$ training steps), GANNO spikes the learning rate and thereby escapes a local optima and improves performance.
Figure 4: Robustness of GANNO on ResNet-18. Test accuracy across epochs for a random agent, VeLO and GANNO, evaluated on ResNet-18 on CIFAR-10, with an initial learning rate of $0.001$ for GANNO and the random agent. We see that GANNO produces robust and competitive schedules better able to handle different weight decay values.
Figure 5: Nine common handcrafted learning rate schedules used at various points in the paper.
...and 2 more figures

Generalisable Agents for Neural Network Optimisation

TL;DR

Abstract

Generalisable Agents for Neural Network Optimisation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)