Stochastic Optimal Control Matching

Carles Domingo-Enrich; Jiequn Han; Brandon Amos; Joan Bruna; Ricky T. Q. Chen

Stochastic Optimal Control Matching

Carles Domingo-Enrich, Jiequn Han, Brandon Amos, Joan Bruna, Ricky T. Q. Chen

TL;DR

Stochastic Optimal Control Matching introduces SOCM, a novel Iterative Diffusion Optimization method for stochastic control that learns a feedback control by fitting a matching vector field and jointly optimizes reparameterization matrices to minimize variance. The framework leverages a path-wise reparameterization trick to compute gradients of conditional expectations with respect to the initial state, situating SOCM among conditioned-diffusion and forward-backward SDE representations. Across four control settings with ground-truths, SOCM achieves lower $L^2$-error than existing IDO methods in three cases (sometimes by an order of magnitude), highlighting variance reduction as a central factor for performance. The paper also provides a practical M_t parameterization, analyses a bias-variance decomposition, and shows code availability to enable reproducibility and extension to diffusion-model contexts.

Abstract

Stochastic optimal control, which has the goal of driving the behavior of noisy systems, is broadly applicable in science, engineering and artificial intelligence. Our work introduces Stochastic Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for stochastic optimal control that stems from the same philosophy as the conditional score matching loss for diffusion models. That is, the control is learned via a least squares problem by trying to fit a matching vector field. The training loss, which is closely connected to the cross-entropy loss, is optimized with respect to both the control function and a family of reparameterization matrices which appear in the matching vector field. The optimization with respect to the reparameterization matrices aims at minimizing the variance of the matching vector field. Experimentally, our algorithm achieves lower error than all the existing IDO techniques for stochastic optimal control for three out of four control problems, in some cases by an order of magnitude. The key idea underlying SOCM is the path-wise reparameterization trick, a novel technique that may be of independent interest. Code at https://github.com/facebookresearch/SOC-matching

Stochastic Optimal Control Matching

TL;DR

-error than existing IDO methods in three cases (sometimes by an order of magnitude), highlighting variance reduction as a central factor for performance. The paper also provides a practical M_t parameterization, analyses a bias-variance decomposition, and shows code availability to enable reproducibility and extension to diffusion-model contexts.

Abstract

Paper Structure (43 sections, 22 theorems, 157 equations, 10 figures, 1 table, 3 algorithms)

This paper contains 43 sections, 22 theorems, 157 equations, 10 figures, 1 table, 3 algorithms.

Introduction
Framework
Setup and Preliminaries
Cost functional and value function
Hamilton-Jacobi-Bellman equation and optimal control
A pair of forward and backward SDEs (FBSDEs)
An analytic expression for the value function
Conditioned diffusions
Existing approaches and related work
Low-dimensional case: solving the HJB equation
High dimensional methods leveraging FBSDEs
The relative entropy loss and the adjoint method
The cross-entropy loss
Variance and log-variance losses
Moment loss
...and 28 more sections

Key Result

Lemma 1

Figures (10)

Figure 1: Plots of the $L^2$ error incurred by the learned control (top), and the norm squared of the gradient with respect to the parameters $\theta$ of the control (bottom), for the Quadratic Ornstein Uhlenbeck (easy) setting and for each IDO loss. Both plots show exponential moving averages computed from the trajectories used during training.
Figure 2: Plots of the $L^2$ error of the learned control for the Linear Ornstein Uhlenbeck and Double Well settings.
Figure 3: Plots of the $L^2$ error incurred by the learned control (top), and the norm squared of the gradient with respect to the parameters $\theta$ of the control (bottom), for the Quadratic Ornstein Uhlenbeck (hard) setting and for each IDO loss. All the algorithms use a warm-started control (see \ref{['sec:warm_start']}).
Figure 4: This plot shows the control objective values for different algorithms (Adjoint, SOCM, and Cross-entropy) across multiple dimensions, with error bars indicating the standard deviations. The y-axis is restricted to [0, 0.1] for better visibility of the lower range values; cross-entropy takes value $2.915 \pm 0.008$ at $d=64$.
Figure 5: Plots of the control objective for the four settings.
...and 5 more figures

Theorems & Definitions (34)

Lemma 1: Path-integral representation of the optimal control kappen2005path
Lemma 2: Cross-entropy loss in terms of control $L^2$ error
Theorem 1: SOCM loss
Proposition 1: Path-wise reparameterization trick for stochastic optimal control
Proposition 2: Bias-variance decomposition of the SOCM loss
Lemma 3
Theorem 2: Optimal reparameterization matrices
Theorem 3: Novikov's theorem
Theorem 4: Girsanov theorem
Corollary 1: Girsanov theorem for SDEs
...and 24 more

Stochastic Optimal Control Matching

TL;DR

Abstract

Stochastic Optimal Control Matching

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (34)