Table of Contents
Fetching ...

A Taxonomy of Loss Functions for Stochastic Optimal Control

Carles Domingo-Enrich

TL;DR

This work clarifies how deep SOC loss functions relate by grouping them into classes that share the same gradient in expectation, meaning they have the same optimization landscape but differ in gradient variance. It introduces three novel losses (Work-SOCM, Cost-SOCM, Unweighted SOCM) and provides a formal taxonomy linking existing losses to each class. The authors validate the taxonomy with simple, synthetic SOC experiments, showing that gradient variance, problem dimensionality, and cost magnitudes govern convergence speed and stability more than the exact gradient structure. The results offer a unified lens for selecting SOC losses tailored to problem features, especially for reward fine-tuning in diffusion/flow models.

Abstract

Stochastic optimal control (SOC) aims to direct the behavior of noisy systems and has widespread applications in science, engineering, and artificial intelligence. In particular, reward fine-tuning of diffusion and flow matching models and sampling from unnormalized methods can be recast as SOC problems. A recent work has introduced Adjoint Matching (Domingo-Enrich et al., 2024), a loss function for SOC problems that vastly outperforms existing loss functions in the reward fine-tuning setup. The goal of this work is to clarify the connections between all the existing (and some new) SOC loss functions. Namely, we show that SOC loss functions can be grouped into classes that share the same gradient in expectation, which means that their optimization landscape is the same; they only differ in their gradient variance. We perform simple SOC experiments to understand the strengths and weaknesses of different loss functions.

A Taxonomy of Loss Functions for Stochastic Optimal Control

TL;DR

This work clarifies how deep SOC loss functions relate by grouping them into classes that share the same gradient in expectation, meaning they have the same optimization landscape but differ in gradient variance. It introduces three novel losses (Work-SOCM, Cost-SOCM, Unweighted SOCM) and provides a formal taxonomy linking existing losses to each class. The authors validate the taxonomy with simple, synthetic SOC experiments, showing that gradient variance, problem dimensionality, and cost magnitudes govern convergence speed and stability more than the exact gradient structure. The results offer a unified lens for selecting SOC losses tailored to problem features, especially for reward fine-tuning in diffusion/flow models.

Abstract

Stochastic optimal control (SOC) aims to direct the behavior of noisy systems and has widespread applications in science, engineering, and artificial intelligence. In particular, reward fine-tuning of diffusion and flow matching models and sampling from unnormalized methods can be recast as SOC problems. A recent work has introduced Adjoint Matching (Domingo-Enrich et al., 2024), a loss function for SOC problems that vastly outperforms existing loss functions in the reward fine-tuning setup. The goal of this work is to clarify the connections between all the existing (and some new) SOC loss functions. Namely, we show that SOC loss functions can be grouped into classes that share the same gradient in expectation, which means that their optimization landscape is the same; they only differ in their gradient variance. We perform simple SOC experiments to understand the strengths and weaknesses of different loss functions.
Paper Structure (41 sections, 11 theorems, 92 equations, 5 figures, 1 algorithm)

This paper contains 41 sections, 11 theorems, 92 equations, 5 figures, 1 algorithm.

Key Result

Proposition 1

The gradients of the losses $\mathcal{L}_{\mathrm{Adj-Match}}$ and $\mathcal{L}_{\mathrm{Work-SOCM}}$ are equal in expectation, and in particular, for any $x \in \mathbb{R}^d$, $t \in T$, and $M$ fulfilling ass:M, we have that Hence, the only critical point of the loss $\mathcal{L}_{\mathrm{Work-SOCM}}$ is the optimal control $u^*$.

Figures (5)

  • Figure 1: Training losses for stochastic optimal control problems. Losses in blue scale to high-dimensions, while losses in red do not, as the gradient variance blows up exponentially with the dimension. By \ref{['thm:main']}, losses in the same block (there are five different blocks) are equal in expectation, i.e. taking infinite batch size would yield the same gradient update. Novel losses are underlined, and losses that admit a Sticking The Landing version are identified with the suffix (+STL).
  • Figure 2: Control $L^2$ error incurred by each loss function throughout training, on five different settings.
  • Figure 3: Control $L^2$ error incurred by the Adjoint Matching, Continuous Adjoint and Discrete Adjoint losses (with and without the Sticking The Landing trick), on five different settings.
  • Figure 4: Control $L^2$ error incurred by each loss function throughout training, on five different settings.
  • Figure 5: Control $L^2$ error incurred by each loss function throughout training, on five different settings.

Theorems & Definitions (14)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Theorem 1: A taxonomy of SOC losses
  • Theorem 2: Girsanov theorem
  • Corollary 1: Girsanov theorem for SDEs
  • Theorem 3: Hamilton-Jacobi-Bellman equation
  • Theorem 4: Path-wise reparameterization trick, domingoenrich2023stochastic, Prop. C.3
  • Corollary 2: Path-wise reparameterization trick for stochastic optimal control, domingoenrich2023stochastic, Prop. 1
  • Theorem 5: Adjoint method for SDEs, Lemma 8 of domingoenrich2023stochastic, li2020scalablekidger2021neural
  • ...and 4 more