Table of Contents
Fetching ...

An Optimal Tightness Bound for the Simulation Lemma

Sam Lobel, Ronald Parr

TL;DR

This work presents a bound for value-prediction error with respect to model misspecification that is tight, including constant factors, and derives a bound that is sub-linear with respect to transition function misspecification.

Abstract

We present a bound for value-prediction error with respect to model misspecification that is tight, including constant factors. This is a direct improvement of the "simulation lemma," a foundational result in reinforcement learning. We demonstrate that existing bounds are quite loose, becoming vacuous for large discount factors, due to the suboptimal treatment of compounding probability errors. By carefully considering this quantity on its own, instead of as a subcomponent of value error, we derive a bound that is sub-linear with respect to transition function misspecification. We then demonstrate broader applicability of this technique, improving a similar bound in the related subfield of hierarchical abstraction.

An Optimal Tightness Bound for the Simulation Lemma

TL;DR

This work presents a bound for value-prediction error with respect to model misspecification that is tight, including constant factors, and derives a bound that is sub-linear with respect to transition function misspecification.

Abstract

We present a bound for value-prediction error with respect to model misspecification that is tight, including constant factors. This is a direct improvement of the "simulation lemma," a foundational result in reinforcement learning. We demonstrate that existing bounds are quite loose, becoming vacuous for large discount factors, due to the suboptimal treatment of compounding probability errors. By carefully considering this quantity on its own, instead of as a subcomponent of value error, we derive a bound that is sub-linear with respect to transition function misspecification. We then demonstrate broader applicability of this technique, improving a similar bound in the related subfield of hierarchical abstraction.
Paper Structure (17 sections, 1 theorem, 33 equations, 2 figures)

This paper contains 17 sections, 1 theorem, 33 equations, 2 figures.

Key Result

theorem 1

For two MDPs $\mathcal{M}$ and $\hat{\mathcal{M}}$ related as described in Equations eq:original-sim-lemma-t-condition and eq:original-sim-lemma-r-condition, the following inequality holds: Furthermore, this bound is tight.

Figures (2)

  • Figure 1: Visualization of relation between $L_1$ distance and overlap of two probability distributions (Equation \ref{['eq:tvd-l1-equivalence']}). The blue and orange shaded regions together comprise the $L_1$ distance. The brown region represents overlap. Overlap plus either the blue or orange sections constitutes a probability distribution, and therefore has total area $1$. Thus the blue and orange regions both individually have area ${\lVert p - \hat{p}\rVert_1 / 2}$, and so ${\lVert \bar{p}\rVert_1 = 1 - \lVert p - \hat{p}\rVert_1 /2}$.
  • Figure 2: Bounds on value error given by original simulation lemma as well as our tighter bounds, normalized by $V_{MAX}$. (Left) Bound on value error with increasing gamma shows the original lemma's suboptimality with respect to discount. (Right) Bound on value error with increasing misspecification shows looseness of linear approximation compared to the tight bound.

Theorems & Definitions (1)

  • theorem 1