Table of Contents
Fetching ...

Continuous Control Reinforcement Learning: Distributed Distributional DrQ Algorithms

Zehao Zhou

TL;DR

DrQ-v2 uses DDPG as the backbone and achieves out-performance in various continuous control tasks through the better expression ability of distributional value function and distributed actor policies.

Abstract

Distributed Distributional DrQ is a model-free and off-policy RL algorithm for continuous control tasks based on the state and observation of the agent, which is an actor-critic method with the data-augmentation and the distributional perspective of critic value function. Aim to learn to control the agent and master some tasks in a high-dimensional continuous space. DrQ-v2 uses DDPG as the backbone and achieves out-performance in various continuous control tasks. Here Distributed Distributional DrQ uses Distributed Distributional DDPG as the backbone, and this modification aims to achieve better performance in some hard continuous control tasks through the better expression ability of distributional value function and distributed actor policies.

Continuous Control Reinforcement Learning: Distributed Distributional DrQ Algorithms

TL;DR

DrQ-v2 uses DDPG as the backbone and achieves out-performance in various continuous control tasks through the better expression ability of distributional value function and distributed actor policies.

Abstract

Distributed Distributional DrQ is a model-free and off-policy RL algorithm for continuous control tasks based on the state and observation of the agent, which is an actor-critic method with the data-augmentation and the distributional perspective of critic value function. Aim to learn to control the agent and master some tasks in a high-dimensional continuous space. DrQ-v2 uses DDPG as the backbone and achieves out-performance in various continuous control tasks. Here Distributed Distributional DrQ uses Distributed Distributional DDPG as the backbone, and this modification aims to achieve better performance in some hard continuous control tasks through the better expression ability of distributional value function and distributed actor policies.
Paper Structure (13 sections, 12 equations, 3 figures, 2 tables, 3 algorithms)

This paper contains 13 sections, 12 equations, 3 figures, 2 tables, 3 algorithms.

Figures (3)

  • Figure 1: DeepMind Control Suite: benchmark tasks. Domains include acrobot, cartpole, cheetah, finger, hopper, humanoid, manipulator, pendulum, reacher, swimmer6, walker. Picture from BarthMaron2018DistributedDD
  • Figure 2: A distributional perspective of value function: (a) one-step distribution under policy $\pi$, (b) add discount to the distribution, (c) add reward to the distribution (d) project the distribution. Picture from Bellemare2017ADP
  • Figure 3: above are the output parameters of different value function types for the last layer of the critic neural networks, including the categorical distribution, mixture of Gaussians distribution, and standard scalar value output. Picture from BarthMaron2018DistributedDD