Distributional Bellman Operators over Mean Embeddings

Li Kevin Wenliang; Grégoire Delétang; Matthew Aitchison; Marcus Hutter; Anian Ruoss; Arthur Gretton; Mark Rowland

Distributional Bellman Operators over Mean Embeddings

Li Kevin Wenliang, Grégoire Delétang, Matthew Aitchison, Marcus Hutter, Anian Ruoss, Arthur Gretton, Mark Rowland

TL;DR

A novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions, is proposed and it is shown that this approach can be straightforwardly combined with deep reinforcement learning.

Abstract

We propose a novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions. We derive several new algorithms for dynamic programming and temporal-difference learning based on this framework, provide asymptotic convergence theory, and examine the empirical performance of the algorithms on a suite of tabular tasks. Further, we show that this approach can be straightforwardly combined with deep reinforcement learning, and obtain a new deep RL agent that improves over baseline distributional approaches on the Arcade Learning Environment.

Distributional Bellman Operators over Mean Embeddings

TL;DR

Abstract

Paper Structure (32 sections, 7 theorems, 38 equations, 13 figures, 1 table, 1 algorithm)

This paper contains 32 sections, 7 theorems, 38 equations, 13 figures, 1 table, 1 algorithm.

Introduction
Background
Distributional RL and Bellman equation
Statistical functionals and sketches
The Bellman sketch framework
General sketches
Sketch-DP at work
Convergence analysis
Concrete example
Experiments
Deep reinforcement learning
Related work
Conclusion
Proofs
Further details and extensions
...and 17 more sections

Key Result

Proposition 4.0

(Regression error to Bellman approximation.) Let $\|\cdot\|$ be a norm on $\mathbb{R}^m$. Then for any RDF $\eta \in \mathscr{P}([G_{\text{min}}, G_{\text{max}}])^{\mathcal{X}}$, we have

Figures (13)

Figure 1: Example of DP update for state 1 with child states 2 and 3 and return distributions $\eta_2$ and $\eta_3$. In the exact distributional DP, $\eta$'s are scaled and shifted by $f_{r}(g)=r+\gamma g$, and then weighted by the transition probabilities. In SFDP rowland2019statistics, the map $\iota$ imputes, from initial sketch values $U$, approximate (e.g. categorical) distributions on which the distribution DP is applied, followed by evaluating the sketch $\psi$. In our approach Sketch-DP, the updates are computed in the mean embedding space, facilitated by the Bellman coefficients $B_r$, avoiding the imputation step.
Figure 2: An example run of Sketch-DP. A, The MRP considered here. B, The first 5 of $m=13$ sinusoidal feature functions $\phi$. The regression Equation (\ref{['eq:regression']}) is performed under a densely spaced grid over the white region $[-4, 4]$. C, Evolution of the estimated mean embeddings from initialisation (grey dot) onto the first two principal components. Crosses represent the ground-truth mean embeddings. D, Ground-truth return distributions (estimated by Monte-Carlo) and their categorical projections onto a regular grid. E, Imputed distributions from the mean embeddings onto the same grid for selected iterations (curves), compared against the categorical projections (stems).
Figure 3: The objects and structure used to analyse the Sketch-DP.
Figure 4: Results of running Sketch-DP (\ref{['alg:dp']}) on tabular environments.
Figure 5: Median (left) and mean (right) human-normalised scores on the Atari 57 suite.
...and 8 more figures

Theorems & Definitions (15)

Definition 2.1: Mean embedding sketches
Remark 3.1: Invariance
Remark 3.2: The need for linear regression
Remark 3.3: Linear update
Proposition 4.0
Proposition 4.0
Proposition 4.0
proof
Proposition 4.0
Proposition 1.0
...and 5 more

Distributional Bellman Operators over Mean Embeddings

TL;DR

Abstract

Distributional Bellman Operators over Mean Embeddings

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (15)