Bellman Error Centering

Xingguo Chen; Yu Gong; Shangdong Yang; Wenhao Wang

Bellman Error Centering

Xingguo Chen, Yu Gong, Shangdong Yang, Wenhao Wang

Abstract

This paper revisits the recently proposed reward centering algorithms including simple reward centering (SRC) and value-based reward centering (VRC), and points out that SRC is indeed the reward centering, while VRC is essentially Bellman error centering (BEC). Based on BEC, we provide the centered fixpoint for tabular value functions, as well as the centered TD fixpoint for linear value function approximation. We design the on-policy CTD algorithm and the off-policy CTDC algorithm, and prove the convergence of both algorithms. Finally, we experimentally validate the stability of our proposed algorithms. Bellman error centering facilitates the extension to various reinforcement learning algorithms.

Bellman Error Centering

Abstract

Bellman Error Centering

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (6)