Table of Contents
Fetching ...

Entropy-regularized penalization schemes for American options and reflected BSDEs with singular generators

Daniel Chee, Noufel Frikha, Libo Li

Abstract

This paper extends our previous work in Chee et al. [9] to continuous-time optimal stopping problems, with a particular focus on American options within an exploratory framework. We pursue two main objectives. First, motivated by reinforcement learning applications, we introduce an entropy-regularized penalization scheme for continuous-time optimal stopping problems. The scheme is inspired by classical penalization techniques for reflected backward stochastic differential equations (RBSDEs) and provides a smooth approximation of the degenerate stopping rule inherent to the American option problem. This regularization promotes exploration, enables the use of gradient-based optimization methods, and leads naturally to policy improvement algorithms. We establish well-posedness and convergence properties of the scheme, and illustrate its numerical feasibility through low-dimensional experiments based on policy iteration and least-squares Monte Carlo methods. Second, from a theoretical perspective, we study the asymptotic limit of the entropy-regularized penalization as the penalization parameter tends to infinity. We show that the limiting value process solves a reflected BSDE with a logarithmically singular driver, and we prove existence and uniqueness of solutions to this new class of RBSDEs via a monotone limit argument. To the best of our knowledge, such equations have not previously been investigated in the literature

Entropy-regularized penalization schemes for American options and reflected BSDEs with singular generators

Abstract

This paper extends our previous work in Chee et al. [9] to continuous-time optimal stopping problems, with a particular focus on American options within an exploratory framework. We pursue two main objectives. First, motivated by reinforcement learning applications, we introduce an entropy-regularized penalization scheme for continuous-time optimal stopping problems. The scheme is inspired by classical penalization techniques for reflected backward stochastic differential equations (RBSDEs) and provides a smooth approximation of the degenerate stopping rule inherent to the American option problem. This regularization promotes exploration, enables the use of gradient-based optimization methods, and leads naturally to policy improvement algorithms. We establish well-posedness and convergence properties of the scheme, and illustrate its numerical feasibility through low-dimensional experiments based on policy iteration and least-squares Monte Carlo methods. Second, from a theoretical perspective, we study the asymptotic limit of the entropy-regularized penalization as the penalization parameter tends to infinity. We show that the limiting value process solves a reflected BSDE with a logarithmically singular driver, and we prove existence and uniqueness of solutions to this new class of RBSDEs via a monotone limit argument. To the best of our knowledge, such equations have not previously been investigated in the literature
Paper Structure (14 sections, 23 theorems, 198 equations, 2 figures, 1 table)

This paper contains 14 sections, 23 theorems, 198 equations, 2 figures, 1 table.

Key Result

Lemma 3.1

For each $n\ge 1$, the entropy-regularized penalization scheme Vln is well posed. In particular, there exists a unique solution $(V^{\lambda,n}, M^{\lambda,n}) \in \mathcal{S}^2 \times \mathcal{H}^2$ to Vln.

Figures (2)

  • Figure 1: Sketches of $\Phi_{1,n}$ for various $n$ and of $\Phi_{1,\infty}(s,x) =-\ln(x-P_s)\mathds{1}_{\{ x > P_s\}} + \infty\mathds{1}_{\{x \leq P_s\}}$.
  • Figure 2: An illustrative sketch of the truncated generator $\Phi_{\lambda,n}(s,x\vee (P_s+\epsilon))$ for several values of $\epsilon$. The functions are Lipschitz continuous and increase as $\epsilon \rightarrow 0$.

Theorems & Definitions (50)

  • Lemma 3.1
  • proof
  • Remark 3.1
  • Theorem 3.1
  • proof
  • Corollary 3.1
  • proof
  • Remark 3.2
  • Theorem 3.2
  • proof
  • ...and 40 more