Table of Contents
Fetching ...

On the grid-sampling limit SDE

Christian Bender, Nguyen Tran Thuan

TL;DR

The grid-sampling SDE is introduced as a proxy for modeling exploration in continuous-time reinforcement learning and its wellposedness in the presence of jumps is discussed.

Abstract

In our recent work [3] we introduced the grid-sampling SDE as a proxy for modeling exploration in continuous-time reinforcement learning. In this note, we provide further motivation for the use of this SDE and discuss its wellposedness in the presence of jumps.

On the grid-sampling limit SDE

TL;DR

The grid-sampling SDE is introduced as a proxy for modeling exploration in continuous-time reinforcement learning and its wellposedness in the presence of jumps is discussed.

Abstract

In our recent work [3] we introduced the grid-sampling SDE as a proxy for modeling exploration in continuous-time reinforcement learning. In this note, we provide further motivation for the use of this SDE and discuss its wellposedness in the presence of jumps.

Paper Structure

This paper contains 5 sections, 5 theorems, 61 equations.

Key Result

Theorem 2.2

There is a ${\mathbb P}$-null set $\mathcal{N}$ such that for every $\omega\in \Omega\backslash \mathcal{N}$ and $A\in \mathcal{B}([0,T])\otimes \mathcal{B}(\mathbb R)$, where in slight abuse of notation we write $\bm{\lambda}_{[0,1]}(B)=\bm{\lambda}_{\mathbb R}(B\cap [0,1])$ for $B\in \mathcal{B}(\mathbb R)$.

Theorems & Definitions (11)

  • Definition 2.1
  • Theorem 2.2
  • proof
  • Theorem 3.2
  • proof : Sketch of the proof.
  • Theorem 4.1: Theorem 2.7 in BN24
  • Theorem 4.2
  • proof : Proof of \ref{['thm:well-posedness-SDE']}
  • Remark 4.3
  • Proposition A.1
  • ...and 1 more