Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning

Yi Shen; Hanyan Huang

Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning

Yi Shen, Hanyan Huang

TL;DR

A novel pseudo-count method for continuous environments called grid-mapping pseudo-count method (GPC) is proposed by extending the count-based method from discrete to continuous environments and is combined with soft actor-critic algorithm (SAC) to create a novel algorithm called GPC-SAC.

Abstract

Offline reinforcement learning learns from a static dataset without interacting with environments, which ensures security and thus owns a good application prospect. However, directly applying naive reinforcement learning algorithm usually fails in an offline environment due to inaccurate Q value approximation caused by out-of-distribution (OOD) state-actions. It is an effective way to solve this problem by penalizing the Q-value of OOD state-actions. Among the methods of punishing OOD state-actions, count-based methods have achieved good results in discrete domains in a simple form. Inspired by it, a novel pseudo-count method for continuous domains called Grid-Mapping Pseudo-Count method (GPC) is proposed by extending the count-based method from discrete to continuous domains. Firstly, the continuous state and action space are mapped to discrete space using Grid-Mapping, then the Q-values of OOD state-actions are constrained through pseudo-count. Secondly, the theoretical proof is given to show that GPC can obtain appropriate uncertainty constraints under fewer assumptions than other pseudo-count methods. Thirdly, GPC is combined with Soft Actor-Critic algorithm (SAC) to get a new algorithm called GPC-SAC. Lastly, experiments on D4RL datasets are given to show that GPC-SAC has better performance and less computational cost than other algorithms that constrain the Q-value.

Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning

TL;DR

Abstract

Paper Structure (31 sections, 4 theorems, 73 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 31 sections, 4 theorems, 73 equations, 9 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Preliminaries
Grid-Mapping Uncertainty For Offline RL
Grid-Mapping Pseudo-Count Method
Agent Learning With Uncertainty
Connection Between Pseudo-Count And Uncertainty
GPC-SAC Algorithm
Experiment
Experiment in Gym
Experiment setting
Result analysis
Experiment in Adroit
Experiment in other environment
Experiment in Maze2d
...and 16 more sections

Key Result

Lemma 1

When selected appropriate hyperparameter $\alpha$, $u(s,a) = \alpha \sqrt {\frac{{\ln{T}}}{{n(s,a)}}}$ is a suitable uncertain constraint in discrete offline RL.

Figures (9)

Figure 1: obtaining pseudo-counting through GPC
Figure 1: The ablation on the learning rate
Figure 2: Gym training curve
Figure 2: The ablation on the Q-function update
Figure 3: Gym training curve
...and 4 more figures

Theorems & Definitions (11)

Lemma 1
Definition 1
Definition 2
Lemma 2
Corollary 1
Theorem 1
proof
proof
proof
proof
...and 1 more

Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning

TL;DR

Abstract

Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (11)