Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning

Motoki Omura; Takayuki Osa; Yusuke Mukuta; Tatsuya Harada

Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning

Motoki Omura, Takayuki Osa, Yusuke Mukuta, Tatsuya Harada

TL;DR

A method called Symmetric Q-learning is proposed, in which the synthetic noise generated from a zero-mean distribution is added to the target values to generate a Gaussian error distribution, which improved the sample efficiency of a state-of-the-art reinforcement learning method by reducing the skewness of the error distribution.

Abstract

In deep reinforcement learning, estimating the value function to evaluate the quality of states and actions is essential. The value function is often trained using the least squares method, which implicitly assumes a Gaussian error distribution. However, a recent study suggested that the error distribution for training the value function is often skewed because of the properties of the Bellman operator, and violates the implicit assumption of normal error distribution in the least squares method. To address this, we proposed a method called Symmetric Q-learning, in which the synthetic noise generated from a zero-mean distribution is added to the target values to generate a Gaussian error distribution. We evaluated the proposed method on continuous control benchmark tasks in MuJoCo. It improved the sample efficiency of a state-of-the-art reinforcement learning method by reducing the skewness of the error distribution.

Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning

TL;DR

Abstract

Paper Structure (28 sections, 18 equations, 41 figures, 6 tables, 2 algorithms)

This paper contains 28 sections, 18 equations, 41 figures, 6 tables, 2 algorithms.

Introduction
Background
Reinforcement Learning
Bellman Error Distribution
Symmetric Q-learning
Assumption of Error Distribution
Normalization of Error Distribution
Training of Noise Distribution
Practical Algorithms
Experiments
Setups
Comparative Evaluation
Comparison at UTD=1 without ensemble
Comparison at UTD=20 with ensemble
Discussion
...and 13 more sections

Figures (41)

Figure 1: Bellman error, negative values of correction noise, and corrected Bellman error from Symmetric REDQ on Hopper-v2. Left: The blue histogram shows the distribution of Bellman errors. The orange histogram represents the distribution of the negative noise added to reduce skewness. It can be observed that the noise distribution fits well with the negative Bellman errors. Right: The green histogram represents the distribution of Bellman errors after adding correction noise. The skewness decreased compared to the blue distribution.
Figure 2: The pre-corrected Bellman error at three different steps when learning Walker2d with SymREDQ.
Figure 3: Comparison of SymSAC, SAC and $\mathcal{X}$-SAC without ensembles for UTD=1
Figure 4: Comparison of SymREDQ, REDQ and $\mathcal{X}$-REDQ for UTD=20.
Figure 5: The top figure illustrates the density of pre-corrected Bellman error (blue) and negative values of noise used for correction (orange). It shows how closely the distribution of $\eta$ approaches the distribution of $- \epsilon$. The bottom figure shows the density of the post-corrected error (green), which is the sum of pre-corrected error and noise. This demonstrates the extent to which the distribution approached a symmetric distribution, and the corrected distribution (green) is more symmetric than the pre-corrected distribution (blue).
...and 36 more figures

Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning

TL;DR

Abstract

Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (41)