Table of Contents
Fetching ...

A Two-fold Randomization Framework for Impulse Control Problems

Haoyang Cao, Yuchao Dong, Zhouhao Yang

TL;DR

The paper introduces a two-fold entropy-regularized randomization framework for impulse control, coupling a randomized nonlocal jump operator M^λ and a randomized stopping operator T^λ to form a fixed-point problem ψ^{λ}=T^{λ1}[M^{λ2}ψ^{λ}]. It derives a semi-linear HJB equation and proves a verification theorem guaranteeing uniqueness, with an iterative scheme that establishes existence and C^{2,α}_{loc} regularity, and shows convergence to the classical impulse control value as λ→0. The authors develop an offline reinforcement learning algorithm based on this framework, proving geometric convergence of policy improvement and providing a TD-based model-free implementation; numerical experiments on a linear model validate convergence toward the classical solution and illustrate the exploration-exploitation tradeoff via the volatility parameter σ. The framework thus offers a principled, learnable approximation for classical impulse control problems and lays the groundwork for learning algorithms in high-dimensional settings.

Abstract

We propose and analyze a randomization scheme for a general class of impulse control problems. The solution to this randomized problem is characterized as the fixed point of a compound operator which consists of a regularized nonlocal operator and a regularized stopping operator. This approach allows us to derive a semi-linear Hamilton-Jacobi-Bellman (HJB) equation. Through an equivalent randomization scheme with a Poisson compound measure, we establish a verification theorem that implies the uniqueness of the solution. Via an iterative approach, we prove the existence of the solution. The existence-and-uniqueness result ensures the randomized problem is well-defined. We then demonstrate that our randomized impulse control problem converges to its classical counterpart as the randomization parameter $\pmb λ$ vanishes. This convergence, combined with the value function's $C^{2,α}_{loc}$ regularity, confirms our framework provides a robust approximation and a foundation for developing learning algorithms. Under this framework, we propose an offline reinforcement learning (RL) algorithm. Its policy improvement step is naturally derived from the iterative approach from the existence proof, which enjoys a geometric convergence rate. We implement a model-free version of the algorithm and numerically demonstrate its effectiveness using a widely-studied example. The results show that our RL algorithm can learn the randomized solution, which accurately approximates its classical counterpart. A sensitivity analysis with respect to the volatility parameter $σ$ in the state process effectively demonstrates the exploration-exploitation tradeoff.

A Two-fold Randomization Framework for Impulse Control Problems

TL;DR

The paper introduces a two-fold entropy-regularized randomization framework for impulse control, coupling a randomized nonlocal jump operator M^λ and a randomized stopping operator T^λ to form a fixed-point problem ψ^{λ}=T^{λ1}[M^{λ2}ψ^{λ}]. It derives a semi-linear HJB equation and proves a verification theorem guaranteeing uniqueness, with an iterative scheme that establishes existence and C^{2,α}_{loc} regularity, and shows convergence to the classical impulse control value as λ→0. The authors develop an offline reinforcement learning algorithm based on this framework, proving geometric convergence of policy improvement and providing a TD-based model-free implementation; numerical experiments on a linear model validate convergence toward the classical solution and illustrate the exploration-exploitation tradeoff via the volatility parameter σ. The framework thus offers a principled, learnable approximation for classical impulse control problems and lays the groundwork for learning algorithms in high-dimensional settings.

Abstract

We propose and analyze a randomization scheme for a general class of impulse control problems. The solution to this randomized problem is characterized as the fixed point of a compound operator which consists of a regularized nonlocal operator and a regularized stopping operator. This approach allows us to derive a semi-linear Hamilton-Jacobi-Bellman (HJB) equation. Through an equivalent randomization scheme with a Poisson compound measure, we establish a verification theorem that implies the uniqueness of the solution. Via an iterative approach, we prove the existence of the solution. The existence-and-uniqueness result ensures the randomized problem is well-defined. We then demonstrate that our randomized impulse control problem converges to its classical counterpart as the randomization parameter vanishes. This convergence, combined with the value function's regularity, confirms our framework provides a robust approximation and a foundation for developing learning algorithms. Under this framework, we propose an offline reinforcement learning (RL) algorithm. Its policy improvement step is naturally derived from the iterative approach from the existence proof, which enjoys a geometric convergence rate. We implement a model-free version of the algorithm and numerically demonstrate its effectiveness using a widely-studied example. The results show that our RL algorithm can learn the randomized solution, which accurately approximates its classical counterpart. A sensitivity analysis with respect to the volatility parameter in the state process effectively demonstrates the exploration-exploitation tradeoff.

Paper Structure

This paper contains 39 sections, 25 theorems, 165 equations, 3 figures, 1 algorithm.

Key Result

Lemma 2.2

Let $X^x=\{X_t^x\}_{t\geq0}$ satisfy eq:sde-uncontrolled with $X_0=x$ for any $x\in\mathbb{R}$. Then,

Figures (3)

  • Figure 1: Comparison in value functions between randomized and classical impulse control with $\bm{\lambda}\in\Lambda$.
  • Figure 2: Sensitivity analysis with respect to volatility $\sigma$ for Algorithm \ref{['alg:td_impulse']} with $\bm{\lambda}=(0.5,0.5)$.
  • Figure 3: Sensitivity analysis with respect to volatility $\sigma$ for Algorithm \ref{['alg:td_impulse']} with $\bm{\lambda}=(1.0,1.0),(0.5,0.5)$.

Theorems & Definitions (26)

  • Lemma 2.2
  • Definition 2.4
  • Lemma 3.1
  • Lemma 3.2
  • Lemma 3.3
  • Lemma 3.4
  • Theorem 3.5
  • Lemma 3.6
  • Theorem 4.1
  • Lemma 4.2
  • ...and 16 more