Particle-based algorithm for stochastic optimal control

Sebastian Reich

Particle-based algorithm for stochastic optimal control

Sebastian Reich

TL;DR

This work recasts stochastic optimal control as a pair of forward and reverse McKean–Vlasov SDEs and links the value function to the ratio of forward and reverse densities through a Cole–Hopf type transform. It develops a particle-based algorithm that fuses ensemble Kalman filtering with diffusion-map techniques to approximate the necessary drift and grad-log terms, yielding a time-dependent affine control $u_t(x)= R G(x)^T (A_t x + c_t)$. The approach is illustrated on nonlinear problems (inverted pendulum and controlled Langevin dynamics), showing that small ensembles (as few as $M=d_x+1$) can achieve robust stabilization when supplemented with diffusion-map refinements. The framework bridges diffusion-based generative modeling and stochastic control, offering a scalable, flexible route for high-dimensional control tasks and a basis for future diffusion-map enhancements and infinite-horizon extensions.

Abstract

The solution to a stochastic optimal control problem can be determined by computing the value function from a discretization of the associated Hamilton-Jacobi-Bellman equation. Alternatively, the problem can be reformulated in terms of a pair of forward-backward SDEs, which makes Monte-Carlo techniques applicable. More recently, the problem has also been viewed from the perspective of forward and reverse time SDEs and their associated Fokker-Planck equations. This approach is closely related to techniques used in diffusion-based generative models. Forward and reverse time formulations express the value function as the ratio of two probability density functions; one stemming from a forward McKean-Vlasov SDE and another one from a reverse McKean-Vlasov SDE. In this paper, we extend this approach to a more general class of stochastic optimal control problems and combine it with ensemble Kalman filter type and diffusion map approximation techniques in order to obtain efficient and robust particle-based algorithms.

Particle-based algorithm for stochastic optimal control

TL;DR

. The approach is illustrated on nonlinear problems (inverted pendulum and controlled Langevin dynamics), showing that small ensembles (as few as

) can achieve robust stabilization when supplemented with diffusion-map refinements. The framework bridges diffusion-based generative modeling and stochastic control, offering a scalable, flexible route for high-dimensional control tasks and a basis for future diffusion-map enhancements and infinite-horizon extensions.

Abstract

Paper Structure (13 sections, 2 theorems, 106 equations, 4 figures)

This paper contains 13 sections, 2 theorems, 106 equations, 4 figures.

Introduction
Mathematical problem formulation
McKean--Vlasov formulation
A brief diversion: Diffusion-based generative modeling
Numerical implementations
EnKF approximation
Combined diffusion map and EnKF approximation
Numerical implementation details
Numerical examples
Inverted pendulum
Controlled Langevin dynamics
Conclusions
Appendix

Key Result

Lemma 3.1

Given the forward evolution equation (eq:Liouville_forward) and the HJB equation (eq:HJB transformed), the probability density defined by (eq:product) satisfies the reverse time evolution equation with terminal condition $\tilde{\pi}_T = Z_T^{-1}\exp(-f) \bar{\pi}_T$, where $\zeta_t$ is an appropriate normalization constant.

Figures (4)

Figure 1: Time evolution of the ensemble mean from the forward evolution (left panel) and the reverse evolution (right panel) both in terms of pendulum position and velocity. It can be seen that the reverse evolution connects the stable and unstable equilibrium points while the forward dynamics stays close to the stable equilibrium.
Figure 2: Time evolution of the position and velocity of the pendulum under the computed affine control law. The pendulum leaves its initial stable solution to reach the unstable equilibrium at time $T=1$. The initial and final velocities are essentially zero.
Figure 3: Computed control gain $A_t$ and shift $c_t$ from the forward and reverse McKean--Vlasov evolution equations. The control is time-independent except for brief transition periods at the beginning and end of the simulation interval.
Figure 4: Comparison of the controlled and uncontrolled Langevin dynamics. Displayed is the time evolution of a single realisation of the SDE (\ref{['eq:LD']}) with and without control.

Theorems & Definitions (6)

Remark 2.1
Lemma 3.1
proof
Remark 3.1
Lemma 3.2
proof

Particle-based algorithm for stochastic optimal control

TL;DR

Abstract

Particle-based algorithm for stochastic optimal control

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (6)