Table of Contents
Fetching ...

Stochastic Optimal Control Matching

Carles Domingo-Enrich, Jiequn Han, Brandon Amos, Joan Bruna, Ricky T. Q. Chen

TL;DR

Stochastic Optimal Control Matching introduces SOCM, a novel Iterative Diffusion Optimization method for stochastic control that learns a feedback control by fitting a matching vector field and jointly optimizes reparameterization matrices to minimize variance. The framework leverages a path-wise reparameterization trick to compute gradients of conditional expectations with respect to the initial state, situating SOCM among conditioned-diffusion and forward-backward SDE representations. Across four control settings with ground-truths, SOCM achieves lower $L^2$-error than existing IDO methods in three cases (sometimes by an order of magnitude), highlighting variance reduction as a central factor for performance. The paper also provides a practical M_t parameterization, analyses a bias-variance decomposition, and shows code availability to enable reproducibility and extension to diffusion-model contexts.

Abstract

Stochastic optimal control, which has the goal of driving the behavior of noisy systems, is broadly applicable in science, engineering and artificial intelligence. Our work introduces Stochastic Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for stochastic optimal control that stems from the same philosophy as the conditional score matching loss for diffusion models. That is, the control is learned via a least squares problem by trying to fit a matching vector field. The training loss, which is closely connected to the cross-entropy loss, is optimized with respect to both the control function and a family of reparameterization matrices which appear in the matching vector field. The optimization with respect to the reparameterization matrices aims at minimizing the variance of the matching vector field. Experimentally, our algorithm achieves lower error than all the existing IDO techniques for stochastic optimal control for three out of four control problems, in some cases by an order of magnitude. The key idea underlying SOCM is the path-wise reparameterization trick, a novel technique that may be of independent interest. Code at https://github.com/facebookresearch/SOC-matching

Stochastic Optimal Control Matching

TL;DR

Stochastic Optimal Control Matching introduces SOCM, a novel Iterative Diffusion Optimization method for stochastic control that learns a feedback control by fitting a matching vector field and jointly optimizes reparameterization matrices to minimize variance. The framework leverages a path-wise reparameterization trick to compute gradients of conditional expectations with respect to the initial state, situating SOCM among conditioned-diffusion and forward-backward SDE representations. Across four control settings with ground-truths, SOCM achieves lower -error than existing IDO methods in three cases (sometimes by an order of magnitude), highlighting variance reduction as a central factor for performance. The paper also provides a practical M_t parameterization, analyses a bias-variance decomposition, and shows code availability to enable reproducibility and extension to diffusion-model contexts.

Abstract

Stochastic optimal control, which has the goal of driving the behavior of noisy systems, is broadly applicable in science, engineering and artificial intelligence. Our work introduces Stochastic Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for stochastic optimal control that stems from the same philosophy as the conditional score matching loss for diffusion models. That is, the control is learned via a least squares problem by trying to fit a matching vector field. The training loss, which is closely connected to the cross-entropy loss, is optimized with respect to both the control function and a family of reparameterization matrices which appear in the matching vector field. The optimization with respect to the reparameterization matrices aims at minimizing the variance of the matching vector field. Experimentally, our algorithm achieves lower error than all the existing IDO techniques for stochastic optimal control for three out of four control problems, in some cases by an order of magnitude. The key idea underlying SOCM is the path-wise reparameterization trick, a novel technique that may be of independent interest. Code at https://github.com/facebookresearch/SOC-matching
Paper Structure (43 sections, 22 theorems, 157 equations, 10 figures, 1 table, 3 algorithms)

This paper contains 43 sections, 22 theorems, 157 equations, 10 figures, 1 table, 3 algorithms.

Key Result

Lemma 1

Figures (10)

  • Figure 1: Plots of the $L^2$ error incurred by the learned control (top), and the norm squared of the gradient with respect to the parameters $\theta$ of the control (bottom), for the Quadratic Ornstein Uhlenbeck (easy) setting and for each IDO loss. Both plots show exponential moving averages computed from the trajectories used during training.
  • Figure 2: Plots of the $L^2$ error of the learned control for the Linear Ornstein Uhlenbeck and Double Well settings.
  • Figure 3: Plots of the $L^2$ error incurred by the learned control (top), and the norm squared of the gradient with respect to the parameters $\theta$ of the control (bottom), for the Quadratic Ornstein Uhlenbeck (hard) setting and for each IDO loss. All the algorithms use a warm-started control (see \ref{['sec:warm_start']}).
  • Figure 4: This plot shows the control objective values for different algorithms (Adjoint, SOCM, and Cross-entropy) across multiple dimensions, with error bars indicating the standard deviations. The y-axis is restricted to [0, 0.1] for better visibility of the lower range values; cross-entropy takes value $2.915 \pm 0.008$ at $d=64$.
  • Figure 5: Plots of the control objective for the four settings.
  • ...and 5 more figures

Theorems & Definitions (34)

  • Lemma 1: Path-integral representation of the optimal control kappen2005path
  • Lemma 2: Cross-entropy loss in terms of control $L^2$ error
  • Theorem 1: SOCM loss
  • Proposition 1: Path-wise reparameterization trick for stochastic optimal control
  • Proposition 2: Bias-variance decomposition of the SOCM loss
  • Lemma 3
  • Theorem 2: Optimal reparameterization matrices
  • Theorem 3: Novikov's theorem
  • Theorem 4: Girsanov theorem
  • Corollary 1: Girsanov theorem for SDEs
  • ...and 24 more