Table of Contents
Fetching ...

Distributional Bellman Operators over Mean Embeddings

Li Kevin Wenliang, Grégoire Delétang, Matthew Aitchison, Marcus Hutter, Anian Ruoss, Arthur Gretton, Mark Rowland

TL;DR

A novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions, is proposed and it is shown that this approach can be straightforwardly combined with deep reinforcement learning.

Abstract

We propose a novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions. We derive several new algorithms for dynamic programming and temporal-difference learning based on this framework, provide asymptotic convergence theory, and examine the empirical performance of the algorithms on a suite of tabular tasks. Further, we show that this approach can be straightforwardly combined with deep reinforcement learning, and obtain a new deep RL agent that improves over baseline distributional approaches on the Arcade Learning Environment.

Distributional Bellman Operators over Mean Embeddings

TL;DR

A novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions, is proposed and it is shown that this approach can be straightforwardly combined with deep reinforcement learning.

Abstract

We propose a novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions. We derive several new algorithms for dynamic programming and temporal-difference learning based on this framework, provide asymptotic convergence theory, and examine the empirical performance of the algorithms on a suite of tabular tasks. Further, we show that this approach can be straightforwardly combined with deep reinforcement learning, and obtain a new deep RL agent that improves over baseline distributional approaches on the Arcade Learning Environment.
Paper Structure (32 sections, 7 theorems, 38 equations, 13 figures, 1 table, 1 algorithm)

This paper contains 32 sections, 7 theorems, 38 equations, 13 figures, 1 table, 1 algorithm.

Key Result

Proposition 4.0

(Regression error to Bellman approximation.) Let $\|\cdot\|$ be a norm on $\mathbb{R}^m$. Then for any RDF $\eta \in \mathscr{P}([G_{\text{min}}, G_{\text{max}}])^{\mathcal{X}}$, we have

Figures (13)

  • Figure 1: Example of DP update for state 1 with child states 2 and 3 and return distributions $\eta_2$ and $\eta_3$. In the exact distributional DP, $\eta$'s are scaled and shifted by $f_{r}(g)=r+\gamma g$, and then weighted by the transition probabilities. In SFDP rowland2019statistics, the map $\iota$ imputes, from initial sketch values $U$, approximate (e.g. categorical) distributions on which the distribution DP is applied, followed by evaluating the sketch $\psi$. In our approach Sketch-DP, the updates are computed in the mean embedding space, facilitated by the Bellman coefficients $B_r$, avoiding the imputation step.
  • Figure 2: An example run of Sketch-DP. A, The MRP considered here. B, The first 5 of $m=13$ sinusoidal feature functions $\phi$. The regression Equation (\ref{['eq:regression']}) is performed under a densely spaced grid over the white region $[-4, 4]$. C, Evolution of the estimated mean embeddings from initialisation (grey dot) onto the first two principal components. Crosses represent the ground-truth mean embeddings. D, Ground-truth return distributions (estimated by Monte-Carlo) and their categorical projections onto a regular grid. E, Imputed distributions from the mean embeddings onto the same grid for selected iterations (curves), compared against the categorical projections (stems).
  • Figure 3: The objects and structure used to analyse the Sketch-DP.
  • Figure 4: Results of running Sketch-DP (\ref{['alg:dp']}) on tabular environments.
  • Figure 5: Median (left) and mean (right) human-normalised scores on the Atari 57 suite.
  • ...and 8 more figures

Theorems & Definitions (15)

  • Definition 2.1: Mean embedding sketches
  • Remark 3.1: Invariance
  • Remark 3.2: The need for linear regression
  • Remark 3.3: Linear update
  • Proposition 4.0
  • Proposition 4.0
  • Proposition 4.0
  • proof
  • Proposition 4.0
  • Proposition 1.0
  • ...and 5 more