Continuous Control Reinforcement Learning: Distributed Distributional DrQ Algorithms

Zehao Zhou

Continuous Control Reinforcement Learning: Distributed Distributional DrQ Algorithms

Zehao Zhou

TL;DR

DrQ-v2 uses DDPG as the backbone and achieves out-performance in various continuous control tasks through the better expression ability of distributional value function and distributed actor policies.

Abstract

Distributed Distributional DrQ is a model-free and off-policy RL algorithm for continuous control tasks based on the state and observation of the agent, which is an actor-critic method with the data-augmentation and the distributional perspective of critic value function. Aim to learn to control the agent and master some tasks in a high-dimensional continuous space. DrQ-v2 uses DDPG as the backbone and achieves out-performance in various continuous control tasks. Here Distributed Distributional DrQ uses Distributed Distributional DDPG as the backbone, and this modification aims to achieve better performance in some hard continuous control tasks through the better expression ability of distributional value function and distributed actor policies.

Continuous Control Reinforcement Learning: Distributed Distributional DrQ Algorithms

TL;DR

Abstract

Paper Structure (13 sections, 12 equations, 3 figures, 2 tables, 3 algorithms)

This paper contains 13 sections, 12 equations, 3 figures, 2 tables, 3 algorithms.

Introduction
Related Work
Preliminaries
Markov Decision Process & Reinforcement Learning
Deep Deterministic Policy Gradient
D4PG: Distributed version of DDPG
Distributed Distributional DrQ
Data preprocess
Distributed Distributional Deep Deterministic Policy Gradient
Categorical distribution
Conclusion
Hyper-parameters
Mujoco Control suite details

Figures (3)

Figure 1: DeepMind Control Suite: benchmark tasks. Domains include acrobot, cartpole, cheetah, finger, hopper, humanoid, manipulator, pendulum, reacher, swimmer6, walker. Picture from BarthMaron2018DistributedDD
Figure 2: A distributional perspective of value function: (a) one-step distribution under policy $\pi$, (b) add discount to the distribution, (c) add reward to the distribution (d) project the distribution. Picture from Bellemare2017ADP
Figure 3: above are the output parameters of different value function types for the last layer of the critic neural networks, including the categorical distribution, mixture of Gaussians distribution, and standard scalar value output. Picture from BarthMaron2018DistributedDD

Continuous Control Reinforcement Learning: Distributed Distributional DrQ Algorithms

TL;DR

Abstract

Continuous Control Reinforcement Learning: Distributed Distributional DrQ Algorithms

Authors

TL;DR

Abstract

Table of Contents

Figures (3)