Table of Contents
Fetching ...

Exploratory Optimal Stopping: A Singular Control Formulation

Jodi Dianetti, Giorgio Ferrari, Renyuan Xu

TL;DR

This paper explores continuous-time and state-space optimal stopping problems from a reinforcement learning perspective and derives a semi-explicit solution to the regularized problem, allowing us to assess the impact of entropy regularization and analyze the vanishing entropy limit.

Abstract

This paper explores continuous-time and state-space optimal stopping problems from a reinforcement learning perspective. We begin by formulating the stopping problem using randomized stopping times, where the decision maker's control is represented by the probability of stopping within a given time--specifically, a bounded, non-decreasing, càdlàg control process. To encourage exploration and facilitate learning, we introduce a regularized version of the problem by penalizing it with the cumulative residual entropy of the randomized stopping time. The regularized problem takes the form of an (n+1)-dimensional degenerate singular stochastic control with finite-fuel. We address this through the dynamic programming principle, which enables us to identify the unique optimal exploratory strategy. For the specific case of a real option problem, we derive a semi-explicit solution to the regularized problem, allowing us to assess the impact of entropy regularization and analyze the vanishing entropy limit. Finally, we propose a reinforcement learning algorithm based on policy iteration. We show both policy improvement and policy convergence results for our proposed algorithm.

Exploratory Optimal Stopping: A Singular Control Formulation

TL;DR

This paper explores continuous-time and state-space optimal stopping problems from a reinforcement learning perspective and derives a semi-explicit solution to the regularized problem, allowing us to assess the impact of entropy regularization and analyze the vanishing entropy limit.

Abstract

This paper explores continuous-time and state-space optimal stopping problems from a reinforcement learning perspective. We begin by formulating the stopping problem using randomized stopping times, where the decision maker's control is represented by the probability of stopping within a given time--specifically, a bounded, non-decreasing, càdlàg control process. To encourage exploration and facilitate learning, we introduce a regularized version of the problem by penalizing it with the cumulative residual entropy of the randomized stopping time. The regularized problem takes the form of an (n+1)-dimensional degenerate singular stochastic control with finite-fuel. We address this through the dynamic programming principle, which enables us to identify the unique optimal exploratory strategy. For the specific case of a real option problem, we derive a semi-explicit solution to the regularized problem, allowing us to assess the impact of entropy regularization and analyze the vanishing entropy limit. Finally, we propose a reinforcement learning algorithm based on policy iteration. We show both policy improvement and policy convergence results for our proposed algorithm.
Paper Structure (28 sections, 12 theorems, 198 equations, 3 figures, 4 algorithms)

This paper contains 28 sections, 12 theorems, 198 equations, 3 figures, 4 algorithms.

Key Result

Proposition 2.2

Under Assumption assumption first, there exists an optimal stopping time.

Figures (3)

  • Figure 1: Demonstration of the Policy Iteration Algorithm.
  • Figure 2: Exponential initialization. Left: Ground-truth and learned $g$ function in selected iterations . Right: Convergence to the ground truth in $L_1$ norm (outer iterations).
  • Figure 3: Linear initialization. Left: Ground-truth and learned $g$ function in selected iterations . Right: Convergence to the ground truth in $L_1$ norm (outer iterations).

Theorems & Definitions (31)

  • Proposition 2.2
  • proof
  • Lemma 2.3
  • Proposition 2.4
  • proof
  • Remark 2.5: Non-exploratory behavior of the optimal controls
  • Remark 2.6
  • Remark 2.7: Connection between optimal stopping and singular control
  • Remark 2.8
  • Proposition 2.9
  • ...and 21 more