Exploratory Optimal Stopping: A Singular Control Formulation

Jodi Dianetti; Giorgio Ferrari; Renyuan Xu

Exploratory Optimal Stopping: A Singular Control Formulation

Jodi Dianetti, Giorgio Ferrari, Renyuan Xu

TL;DR

This paper explores continuous-time and state-space optimal stopping problems from a reinforcement learning perspective and derives a semi-explicit solution to the regularized problem, allowing us to assess the impact of entropy regularization and analyze the vanishing entropy limit.

Abstract

This paper explores continuous-time and state-space optimal stopping problems from a reinforcement learning perspective. We begin by formulating the stopping problem using randomized stopping times, where the decision maker's control is represented by the probability of stopping within a given time--specifically, a bounded, non-decreasing, càdlàg control process. To encourage exploration and facilitate learning, we introduce a regularized version of the problem by penalizing it with the cumulative residual entropy of the randomized stopping time. The regularized problem takes the form of an (n+1)-dimensional degenerate singular stochastic control with finite-fuel. We address this through the dynamic programming principle, which enables us to identify the unique optimal exploratory strategy. For the specific case of a real option problem, we derive a semi-explicit solution to the regularized problem, allowing us to assess the impact of entropy regularization and analyze the vanishing entropy limit. Finally, we propose a reinforcement learning algorithm based on policy iteration. We show both policy improvement and policy convergence results for our proposed algorithm.

Exploratory Optimal Stopping: A Singular Control Formulation

TL;DR

Abstract

Paper Structure (28 sections, 12 theorems, 198 equations, 3 figures, 4 algorithms)

This paper contains 28 sections, 12 theorems, 198 equations, 3 figures, 4 algorithms.

Introduction
Our work and contributions
Related literature
Randomized optimal stopping and singular control.
Continuous-time RL for regular control.
Machine learning for optimal stopping.
Outline of the paper
General notation
Exploratory formulation and entropy regularization of the OS problem
Exploratory formulation via singular controls
Entropy regularization
Vanishing entropy limit
Solving the entropy regularized OS problem via dynamic programming
Preliminary estimates
Proof of Theorem \ref{['theorem value function']}: solution to the HJB
...and 13 more sections

Key Result

Proposition 2.2

Under Assumption assumption first, there exists an optimal stopping time.

Figures (3)

Figure 1: Demonstration of the Policy Iteration Algorithm.
Figure 2: Exponential initialization. Left: Ground-truth and learned $g$ function in selected iterations . Right: Convergence to the ground truth in $L_1$ norm (outer iterations).
Figure 3: Linear initialization. Left: Ground-truth and learned $g$ function in selected iterations . Right: Convergence to the ground truth in $L_1$ norm (outer iterations).

Theorems & Definitions (31)

Proposition 2.2
proof
Lemma 2.3
Proposition 2.4
proof
Remark 2.5: Non-exploratory behavior of the optimal controls
Remark 2.6
Remark 2.7: Connection between optimal stopping and singular control
Remark 2.8
Proposition 2.9
...and 21 more

Exploratory Optimal Stopping: A Singular Control Formulation

TL;DR

Abstract

Exploratory Optimal Stopping: A Singular Control Formulation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (31)