Training Larger Networks for Deep Reinforcement Learning

Kei Ota; Devesh K. Jha; Asako Kanezaki

Training Larger Networks for Deep Reinforcement Learning

Kei Ota, Devesh K. Jha, Asako Kanezaki

TL;DR

The paper addresses the challenge that larger neural networks do not reliably improve performance in deep RL. It introduces a three-fold approach—DenseNet-based wide networks, decoupled representation learning via an auxiliary next-state prediction task (OFENet), and Ape-X-like distributed training—to enable stable and effective training of very large networks. Empirical results across SAC and TD3 on MuJoCo locomotion tasks show consistent gains from widening networks, aided by improved representation learning and abundant on-policy data; effective rank analyses corroborate reduced feature-space collapse. The findings demonstrate that large-scale networks can boost DRL performance and offer actionable architectural guidance for scalable RL in continuous control tasks.

Abstract

The success of deep learning in the computer vision and natural language processing communities can be attributed to training of very deep neural networks with millions or billions of parameters which can then be trained with massive amounts of data. However, similar trend has largely eluded training of deep reinforcement learning (RL) algorithms where larger networks do not lead to performance improvement. Previous work has shown that this is mostly due to instability during training of deep RL agents when using larger networks. In this paper, we make an attempt to understand and address training of larger networks for deep RL. We first show that naively increasing network capacity does not improve performance. Then, we propose a novel method that consists of 1) wider networks with DenseNet connection, 2) decoupling representation learning from training of RL, 3) a distributed training method to mitigate overfitting problems. Using this three-fold technique, we show that we can train very large networks that result in significant performance gains. We present several ablation studies to demonstrate the efficacy of the proposed method and some intuitive understanding of the reasons for performance gain. We show that our proposed method outperforms other baseline algorithms on several challenging locomotion tasks.

Training Larger Networks for Deep Reinforcement Learning

TL;DR

Abstract

Training Larger Networks for Deep Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (22)