Table of Contents
Fetching ...

On the continuity and smoothness of the value function in reinforcement learning and optimal control

Hans Harder, Sebastian Peitz

TL;DR

It is shown that the value function is always Hölder continuous under relatively weak assumptions on the underlying system and that non-differentiable value functions can be made differentiable by slightly “disturbing” the system.

Abstract

The value function plays a crucial role as a measure for the cumulative future reward an agent receives in both reinforcement learning and optimal control. It is therefore of interest to study how similar the values of neighboring states are, i.e., to investigate the continuity of the value function. We do so by providing and verifying upper bounds on the value function's modulus of continuity. Additionally, we show that the value function is always Hölder continuous under relatively weak assumptions on the underlying system and that non-differentiable value functions can be made differentiable by slightly "disturbing" the system.

On the continuity and smoothness of the value function in reinforcement learning and optimal control

TL;DR

It is shown that the value function is always Hölder continuous under relatively weak assumptions on the underlying system and that non-differentiable value functions can be made differentiable by slightly “disturbing” the system.

Abstract

The value function plays a crucial role as a measure for the cumulative future reward an agent receives in both reinforcement learning and optimal control. It is therefore of interest to study how similar the values of neighboring states are, i.e., to investigate the continuity of the value function. We do so by providing and verifying upper bounds on the value function's modulus of continuity. Additionally, we show that the value function is always Hölder continuous under relatively weak assumptions on the underlying system and that non-differentiable value functions can be made differentiable by slightly "disturbing" the system.
Paper Structure (9 sections, 10 theorems, 57 equations, 3 figures)

This paper contains 9 sections, 10 theorems, 57 equations, 3 figures.

Key Result

Proposition 1

Let $\Phi(x) = 4x(1-x)$ be the logistic map on $S=[0,1]$, put $r(x) = x$ and let $\gamma \in [\frac{1}{2}, 1)$. Then $v(x)=\sum_{n=0}^\infty \gamma^n r(\Phi^n(x))$ is nowhere differentiable.

Figures (3)

  • Figure 1: The value function $v$ from \ref{['thr:logistic']} for the discount factor $\gamma = 0.8$. The "smoothed" version $w$ is the value function that one obtains when disturbing the same system using Gaussian noise with standard deviation $\sigma = 0.01$, cf. \ref{['thr:differentiability']}.
  • Figure 2: Visual depiction of the idea in \ref{['thr:hoelder-integrals']}.
  • Figure 3: Left: The value functions corresponding to the example in \ref{['sec:sharpness_and_example']} for $L = 1.5$ and discount factors $(\gamma_0, \gamma_1, \gamma_2) = (0.5, 0.9, 0.99)$ when normalized to a maximal value of $1$. Right: The moduli of continuity for the value functions in comparison to the bounds given by \ref{['thr:hoelder-sums']}, visualized by dashed lines in the same color. The bounds from bernsteinAdaptiveresolutionReinforcementLearning2010 are visualized using dotted lines (also the same color).

Theorems & Definitions (24)

  • Proposition 1: see yamagutiWeierstrassFunctionChaos1983
  • Remark 1
  • Proposition 2
  • proof
  • Definition 1
  • Example 1
  • Remark 2
  • Remark 3
  • Theorem 1
  • proof
  • ...and 14 more