In value-based deep reinforcement learning, a pruned network is a good network

Johan Obando-Ceron; Aaron Courville; Pablo Samuel Castro

In value-based deep reinforcement learning, a pruned network is a good network

Johan Obando-Ceron, Aaron Courville, Pablo Samuel Castro

TL;DR

It is demonstrated that gradual magnitude pruning enables value-based agents to maximize parameter effectiveness, resulting in networks that yield dramatic performance improvements over traditional networks, using only a small fraction of the full network parameters.

Abstract

Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage prior insights into the advantages of sparse training techniques and demonstrate that gradual magnitude pruning enables value-based agents to maximize parameter effectiveness. This results in networks that yield dramatic performance improvements over traditional networks, using only a small fraction of the full network parameters.

In value-based deep reinforcement learning, a pruned network is a good network

TL;DR

Abstract

Paper Structure (43 sections, 2 equations, 30 figures, 4 tables)

This paper contains 43 sections, 2 equations, 30 figures, 4 tables.

Introduction
Related Work
Scaling in Deep RL
Sparse Models in Deep RL
Overparameterization in Deep RL
Background
Deep reinforcement learning
Gradual pruning
Pruning can boost deep RL performance
Implementation details
Online RL
Low data regime
Offline RL
Actor-Critic methods
Stability of the pruned network
...and 28 more sections

Figures (30)

Figure 1: Scaling network widths for ResNet architecture, for DQN and Rainbow with an Impala-based ResNet espeholt2018impala. We report the interquantile mean after 40 million environment steps, aggregated over 15 games with 5 seeds each; error bars indicate 95% stratified bootstrap confidence intervals. Replay ratio is fixed to the standard $0.25$. The default network is Dense, which we indicate with a blue color in all the plots, for clarity.
Figure 2: Gradual magnitude pruning schedules used in our experiments, to a target sparsity of 95%, as specified in \ref{['eqn:polynomialSchedule']}. Impact of varying pruning schedules, see \ref{['fig:varyingSchedules']}.
Figure 3: Evaluating how varying sparsity affects performance for DQN with the ResNet architecture and a width multiplier of 3. See Section \ref{['subsec:setup']} for training details.
Figure 4: Scaling network widths for the original CNN architecture of mnih2015humanlevel, for DQN (left) and Rainbow (right). See Section \ref{['subsec:setup']} for training details.
Figure 5: Scaling replay ratio for Rainbow with the ResNet architecture with a width multiplier of $3$. Default replaly ratio is $0.25$. See Section \ref{['subsec:setup']} for training details.
...and 25 more figures

In value-based deep reinforcement learning, a pruned network is a good network

TL;DR

Abstract

In value-based deep reinforcement learning, a pruned network is a good network

Authors

TL;DR

Abstract

Table of Contents

Figures (30)