Table of Contents
Fetching ...

D2RL: Deep Dense Architectures in Reinforcement Learning

Samarth Sinha, Homanga Bharadhwaj, Aravind Srinivas, Animesh Garg

TL;DR

This work identifies a core bottleneck in deep reinforcement learning: naively deep MLPs propagate information poorly due to DPI, hindering sample efficiency and optimization. It proposes D2RL, a Dense Deep Architecture for Reinforcement Learning that concatenates inputs into every hidden layer, enabling deeper yet information-preserving networks for both policy and value function representations. Across a wide range of manipulation and locomotion tasks and multiple off-policy algorithms (e.g., SAC, TD3), D2RL yields significant gains in sample efficiency and asymptotic performance, and outperforms ResNet-style alternatives. The results suggest architectural inductive biases—specifically dense connectivity—are a practical lever to improve DRL in robotics, with broad applicability and a public code baseline.

Abstract

While improvements in deep learning architectures have played a crucial role in improving the state of supervised and unsupervised learning in computer vision and natural language processing, neural network architecture choices for reinforcement learning remain relatively under-explored. We take inspiration from successful architectural choices in computer vision and generative modelling, and investigate the use of deeper networks and dense connections for reinforcement learning on a variety of simulated robotic learning benchmark environments. Our findings reveal that current methods benefit significantly from dense connections and deeper networks, across a suite of manipulation and locomotion tasks, for both proprioceptive and image-based observations. We hope that our results can serve as a strong baseline and further motivate future research into neural network architectures for reinforcement learning. The project website with code is at this link https://sites.google.com/view/d2rl/home.

D2RL: Deep Dense Architectures in Reinforcement Learning

TL;DR

This work identifies a core bottleneck in deep reinforcement learning: naively deep MLPs propagate information poorly due to DPI, hindering sample efficiency and optimization. It proposes D2RL, a Dense Deep Architecture for Reinforcement Learning that concatenates inputs into every hidden layer, enabling deeper yet information-preserving networks for both policy and value function representations. Across a wide range of manipulation and locomotion tasks and multiple off-policy algorithms (e.g., SAC, TD3), D2RL yields significant gains in sample efficiency and asymptotic performance, and outperforms ResNet-style alternatives. The results suggest architectural inductive biases—specifically dense connectivity—are a practical lever to improve DRL in robotics, with broad applicability and a public code baseline.

Abstract

While improvements in deep learning architectures have played a crucial role in improving the state of supervised and unsupervised learning in computer vision and natural language processing, neural network architecture choices for reinforcement learning remain relatively under-explored. We take inspiration from successful architectural choices in computer vision and generative modelling, and investigate the use of deeper networks and dense connections for reinforcement learning on a variety of simulated robotic learning benchmark environments. Our findings reveal that current methods benefit significantly from dense connections and deeper networks, across a suite of manipulation and locomotion tasks, for both proprioceptive and image-based observations. We hope that our results can serve as a strong baseline and further motivate future research into neural network architectures for reinforcement learning. The project website with code is at this link https://sites.google.com/view/d2rl/home.

Paper Structure

This paper contains 21 sections, 4 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Visual illustrations of the proposed dense-connections based D2RL modification to the policy $\pi_\phi(\cdot)$ and Q-value $Q_\theta(\cdot)$ neural network architectures. The inputs are passed to each layer of the neural network through identity mappings. Forward pass corresponds to moving from left to right in the figure. For state-based envs, $s_t$ is the observed simulator state and there is no convolutional encoder.
  • Figure 2: The effect of increasing the number of fully-connected layers to parameterize the policy and Q-Networks for Soft-Actor Critic sac on Ant-v2 in the OpenAI Gym Suite gym. It is evident that performance drops when increasing depth after 2 layers. However, our D2RL agent with 4 layers does not suffer from this, and performs better.
  • Figure 3: OpenAI Gym benchmark environments with SAC. Comparison of the proposed D2RL and the baselines on a suite of OpenAI-Gym environments. We apply the D2RL modification to SAC sac. The error bars are with respect to 5 random seeds. The results on Humanoid env are in the Appendix.
  • Figure 4: OpenAI Gym benchmark environments with TD3. Comparison of the proposed variation D2RL and the baselines on a suite of OpenAI-Gym environments. We apply the D2RL modification to TD3 td3. The error bars are with respect to 5 random seeds.
  • Figure 5: Challenging selected manipulation and locomotion environments. Comparison of the proposed variation D2RL and the baselines on a suite of challenging manipulation and locomotion environments. We apply the D2RL modification to the SAC sac, HER her, and HIRO hiro algorithms and compare relative performance in terms of average episodic rewards with respect to the baselines. The task complexity increases from Fetch Reach to Fetch Slide. Jaco Reach is challenging due to high-dimensional torque controller action space, AntMaze requires exploration to solve a temporally extended problem, and Furniture BlockJoin requires solving two tasks- join and lift sequentially. The error bars are with respect to 5 random seeds. Some additional results on the Fetch envs are in the Appendix.
  • ...and 3 more figures