An Efficient On-Policy Deep Learning Framework for Stochastic Optimal Control
Mengjian Hua, Mathieu Laurière, Eric Vanden-Eijnden
TL;DR
This work tackles the scalability bottlenecks of differentiating through controlled SDEs in stochastic optimal control by coupling the Girsanov change of measure with a simulation-free on-policy gradient. It derives a gradient formula that evaluates on the actual control path without backpropagating through trajectory solutions and offers a practical alternative objective amenable to automatic differentiation. The framework is applied to (i) constructing Föllmer processes for sampling from unnormalized distributions and (ii) fine-tuning diffusion-model drifts to tilt target distributions, achieving substantial reductions in memory and computation compared to SDE-differentiation baselines. Overall, the method provides a scalable, memory-efficient paradigm for training neural controllers in SOC and related probabilistic modeling tasks with real-world impact in sampling and generative modeling.
Abstract
We present a novel on-policy algorithm for solving stochastic optimal control (SOC) problems. By leveraging the Girsanov theorem, our method directly computes on-policy gradients of the SOC objective without expensive backpropagation through stochastic differential equations or adjoint problem solutions. This approach significantly accelerates the optimization of neural network control policies while scaling efficiently to high-dimensional problems and long time horizons. We evaluate our method on classical SOC benchmarks as well as applications to sampling from unnormalized distributions via Schrödinger-Föllmer processes and fine-tuning pre-trained diffusion models. Experimental results demonstrate substantial improvements in both computational speed and memory efficiency compared to existing approaches.
