Table of Contents
Fetching ...

Training Larger Networks for Deep Reinforcement Learning

Kei Ota, Devesh K. Jha, Asako Kanezaki

TL;DR

The paper addresses the challenge that larger neural networks do not reliably improve performance in deep RL. It introduces a three-fold approach—DenseNet-based wide networks, decoupled representation learning via an auxiliary next-state prediction task (OFENet), and Ape-X-like distributed training—to enable stable and effective training of very large networks. Empirical results across SAC and TD3 on MuJoCo locomotion tasks show consistent gains from widening networks, aided by improved representation learning and abundant on-policy data; effective rank analyses corroborate reduced feature-space collapse. The findings demonstrate that large-scale networks can boost DRL performance and offer actionable architectural guidance for scalable RL in continuous control tasks.

Abstract

The success of deep learning in the computer vision and natural language processing communities can be attributed to training of very deep neural networks with millions or billions of parameters which can then be trained with massive amounts of data. However, similar trend has largely eluded training of deep reinforcement learning (RL) algorithms where larger networks do not lead to performance improvement. Previous work has shown that this is mostly due to instability during training of deep RL agents when using larger networks. In this paper, we make an attempt to understand and address training of larger networks for deep RL. We first show that naively increasing network capacity does not improve performance. Then, we propose a novel method that consists of 1) wider networks with DenseNet connection, 2) decoupling representation learning from training of RL, 3) a distributed training method to mitigate overfitting problems. Using this three-fold technique, we show that we can train very large networks that result in significant performance gains. We present several ablation studies to demonstrate the efficacy of the proposed method and some intuitive understanding of the reasons for performance gain. We show that our proposed method outperforms other baseline algorithms on several challenging locomotion tasks.

Training Larger Networks for Deep Reinforcement Learning

TL;DR

The paper addresses the challenge that larger neural networks do not reliably improve performance in deep RL. It introduces a three-fold approach—DenseNet-based wide networks, decoupled representation learning via an auxiliary next-state prediction task (OFENet), and Ape-X-like distributed training—to enable stable and effective training of very large networks. Empirical results across SAC and TD3 on MuJoCo locomotion tasks show consistent gains from widening networks, aided by improved representation learning and abundant on-policy data; effective rank analyses corroborate reduced feature-space collapse. The findings demonstrate that large-scale networks can boost DRL performance and offer actionable architectural guidance for scalable RL in continuous control tasks.

Abstract

The success of deep learning in the computer vision and natural language processing communities can be attributed to training of very deep neural networks with millions or billions of parameters which can then be trained with massive amounts of data. However, similar trend has largely eluded training of deep reinforcement learning (RL) algorithms where larger networks do not lead to performance improvement. Previous work has shown that this is mostly due to instability during training of deep RL agents when using larger networks. In this paper, we make an attempt to understand and address training of larger networks for deep RL. We first show that naively increasing network capacity does not improve performance. Then, we propose a novel method that consists of 1) wider networks with DenseNet connection, 2) decoupling representation learning from training of RL, 3) a distributed training method to mitigate overfitting problems. Using this three-fold technique, we show that we can train very large networks that result in significant performance gains. We present several ablation studies to demonstrate the efficacy of the proposed method and some intuitive understanding of the reasons for performance gain. We show that our proposed method outperforms other baseline algorithms on several challenging locomotion tasks.

Paper Structure

This paper contains 33 sections, 3 equations, 22 figures, 2 tables.

Figures (22)

  • Figure 1: Average return.
  • Figure 2: Loss surface.
  • Figure 4: Proposed architecture to train larger networks for deep RL agents. We combine three elements. Firstly, we decouple representation learning from RL to extract an informative feature $z_{s_t}$ from the current state $s_t$ using a feature extractor network that is trained using an auxiliary task of predicting the next state $s_{t+1}$. Secondly, we use large networks using DenseNet architecture, which allows stronger feature propagation. Finally, we employ the Ape-X-like distributed training framework to mitigate the overfitting problems which tends to happen in larger networks, and enables to collect more on-policy data that can improve performance. FC refers to a fully-connected layer.
  • Figure 5: Average return.
  • Figure 6: Loss surface.
  • ...and 17 more figures