Table of Contents
Fetching ...

In value-based deep reinforcement learning, a pruned network is a good network

Johan Obando-Ceron, Aaron Courville, Pablo Samuel Castro

TL;DR

It is demonstrated that gradual magnitude pruning enables value-based agents to maximize parameter effectiveness, resulting in networks that yield dramatic performance improvements over traditional networks, using only a small fraction of the full network parameters.

Abstract

Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage prior insights into the advantages of sparse training techniques and demonstrate that gradual magnitude pruning enables value-based agents to maximize parameter effectiveness. This results in networks that yield dramatic performance improvements over traditional networks, using only a small fraction of the full network parameters.

In value-based deep reinforcement learning, a pruned network is a good network

TL;DR

It is demonstrated that gradual magnitude pruning enables value-based agents to maximize parameter effectiveness, resulting in networks that yield dramatic performance improvements over traditional networks, using only a small fraction of the full network parameters.

Abstract

Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage prior insights into the advantages of sparse training techniques and demonstrate that gradual magnitude pruning enables value-based agents to maximize parameter effectiveness. This results in networks that yield dramatic performance improvements over traditional networks, using only a small fraction of the full network parameters.
Paper Structure (43 sections, 2 equations, 30 figures, 4 tables)

This paper contains 43 sections, 2 equations, 30 figures, 4 tables.

Figures (30)

  • Figure 1: Scaling network widths for ResNet architecture, for DQN and Rainbow with an Impala-based ResNet espeholt2018impala. We report the interquantile mean after 40 million environment steps, aggregated over 15 games with 5 seeds each; error bars indicate 95% stratified bootstrap confidence intervals. Replay ratio is fixed to the standard $0.25$. The default network is Dense, which we indicate with a blue color in all the plots, for clarity.
  • Figure 2: Gradual magnitude pruning schedules used in our experiments, to a target sparsity of 95%, as specified in \ref{['eqn:polynomialSchedule']}. Impact of varying pruning schedules, see \ref{['fig:varyingSchedules']}.
  • Figure 3: Evaluating how varying sparsity affects performance for DQN with the ResNet architecture and a width multiplier of 3. See Section \ref{['subsec:setup']} for training details.
  • Figure 4: Scaling network widths for the original CNN architecture of mnih2015humanlevel, for DQN (left) and Rainbow (right). See Section \ref{['subsec:setup']} for training details.
  • Figure 5: Scaling replay ratio for Rainbow with the ResNet architecture with a width multiplier of $3$. Default replaly ratio is $0.25$. See Section \ref{['subsec:setup']} for training details.
  • ...and 25 more figures