Table of Contents
Fetching ...

A Convergence-Guaranteed Algorithm for Stochastic Optimal Control Problems

Mohsen Amidzadeh

Abstract

Stochastic Optimal Control Problems (SOCPs) plays a major role in the sequential decision-making challenges. There exist various iterative algorithms, under framework of stochastic maximum principle, that sequentially find the optimal control decision. However, they are based on the adjoint sensitivity analysis that necessitates simulation of an adjoint process, typically a backward stochastic differential equation (SDE) that must simultaneously be adapted to a forward filtration and satisfy a terminal condition, which substantially increases complexity and exacerbates the curse of dimensionality. We instead develop a stochastic maximum principle based on the Malliavin calculus, which enables us to devise an iterative algorithm without need of an adjoint process. Our algorithm however needs the Malliavin derivative that can be efficiently computed based on a forward simulator. Empirical comparisons against standard iterative algorithms demonstrate that our approach alleviates the dimensionality bottleneck while delivering competitive performance on the considered SOCPs.

A Convergence-Guaranteed Algorithm for Stochastic Optimal Control Problems

Abstract

Stochastic Optimal Control Problems (SOCPs) plays a major role in the sequential decision-making challenges. There exist various iterative algorithms, under framework of stochastic maximum principle, that sequentially find the optimal control decision. However, they are based on the adjoint sensitivity analysis that necessitates simulation of an adjoint process, typically a backward stochastic differential equation (SDE) that must simultaneously be adapted to a forward filtration and satisfy a terminal condition, which substantially increases complexity and exacerbates the curse of dimensionality. We instead develop a stochastic maximum principle based on the Malliavin calculus, which enables us to devise an iterative algorithm without need of an adjoint process. Our algorithm however needs the Malliavin derivative that can be efficiently computed based on a forward simulator. Empirical comparisons against standard iterative algorithms demonstrate that our approach alleviates the dimensionality bottleneck while delivering competitive performance on the considered SOCPs.
Paper Structure (20 sections, 12 theorems, 79 equations, 5 figures, 1 table, 2 algorithms)

This paper contains 20 sections, 12 theorems, 79 equations, 5 figures, 1 table, 2 algorithms.

Key Result

Proposition 3.2

Consider SOCP $\hbox{P}_1$ with the cost functional $J(\textbf{u},\textbf{x}_0)$ (EQ:functional), its variation along the direction $\textbf{v}(\cdot)$ is obtained by: where $H_{\textbf{u}}$ is the gradient of the Hamiltonian $H$ w.r.t. $\textbf{u}_t$ with and $\textbf{y}_t \in \mathbb{R}^n$ is an adjoint process which is obtained based on the following backward SDE: with $\textbf{z}^i \in \mat

Figures (5)

  • Figure 1: The results of our Mal-GPro algorithm compared to the baseline algorithms, Ad-SGD and Ad-GPro, for problem (\ref{['Eq:scalar-exp1']}). (a) shows the optimum value of objective, and (b) shows the optimum control decision. Clearly, all of the considered algorithm can properly learn the control decision and their respective solutions match with the analytical one. (c) tabulates the control error $\mathcal{E}_c$ (integral of L2-norm of difference between the obtained control and the analytical one) of different schemes; Mal-GPro algorithm needs smaller step-size to reach a level of error comparable to Ad-SGD and Ad-GPro.
  • Figure 2: The results of our Mal-GPro algorithm compared to the baseline algorithms, Ad-SGD and Ad-GPro, for problem (\ref{['Eq:scalar-exp2']}). (a) shows the optimum value of objective and (b) the optimum control decision. While Ad-SGD cannot provide a proper solution, the control trajectories of Mal-GPro and Ad-GPro approximately coincide. The control trajectory produced by Ad-GPro appears non-smooth due to the inherent limited capacity of the parametric approximation used in simulating the adjoint process.
  • Figure 3: The results of our Mal-GPro algorithm compared to the baseline algorithms, Ad-SGD and Ad-GPro, for problem (\ref{['Eq:vector-exp1']}). (a) shows the optimum value of objective and (b) the optimum control decision. Clearly, all of the considered algorithm can properly learn the control decision and their respective solutions match with the analytical one. (c) tabulates the control error $\mathcal{E}_c$ of different schemes; Mal-GPro algorithm needs smaller step-size to reach a level of error comparable to Ad-SGD and Ad-GPro.
  • Figure 4: The performance of Mal-GPro algorithms on SOCP (\ref{['Eq:vector-exp2']}), compared to the baseline approaches Ad-GPro and Ad-SGD. (a) shows the control vector by Mal-GPro, (b) the control vector found by Ad-SGD, and (c) the control vector by Ad-GPro. In contrast to Mal-GPro, Ad-SGD cannot properly find the optimum control decision.
  • Figure 5: The results of our Mal-GPro algorithm compared to the baseline algorithm Ad-SGD for problem (\ref{['Eq:vector-exp3']}) with $\Delta t = 10^{-2}$. (a) shows the analytical control solution, and (b)-(c) show the control decisions obtained by Mal-GPro and Ad-SGD. Clearly, all of them find the same control solutions. (d) tabulates the control error $\mathcal{E}_c$ of Mal-GPro against Ad-SGD. Our algorithm develops lower control error with less variance compared to Ad-SGD for this problem.

Theorems & Definitions (26)

  • Definition 3.1
  • Proposition 3.2
  • proof
  • Definition 3.3
  • Theorem 4.1
  • proof
  • Corollary 4.2
  • proof
  • Theorem 4.3
  • proof
  • ...and 16 more