Table of Contents
Fetching ...

Trust Region Constrained Measure Transport in Path Space for Stochastic Optimal Control and Inference

Denis Blessing, Julius Berner, Lorenz Richter, Carles Domingo-Enrich, Yuanqi Du, Arash Vahdat, Gerhard Neumann

Abstract

Solving stochastic optimal control problems with quadratic control costs can be viewed as approximating a target path space measure, e.g. via gradient-based optimization. In practice, however, this optimization is challenging in particular if the target measure differs substantially from the prior. In this work, we therefore approach the problem by iteratively solving constrained problems incorporating trust regions that aim for approaching the target measure gradually in a systematic way. It turns out that this trust region based strategy can be understood as a geometric annealing from the prior to the target measure, where, however, the incorporated trust regions lead to a principled and educated way of choosing the time steps in the annealing path. We demonstrate in multiple optimal control applications that our novel method can improve performance significantly, including tasks in diffusion-based sampling, transition path sampling, and fine-tuning of diffusion models.

Trust Region Constrained Measure Transport in Path Space for Stochastic Optimal Control and Inference

Abstract

Solving stochastic optimal control problems with quadratic control costs can be viewed as approximating a target path space measure, e.g. via gradient-based optimization. In practice, however, this optimization is challenging in particular if the target measure differs substantially from the prior. In this work, we therefore approach the problem by iteratively solving constrained problems incorporating trust regions that aim for approaching the target measure gradually in a systematic way. It turns out that this trust region based strategy can be understood as a geometric annealing from the prior to the target measure, where, however, the incorporated trust regions lead to a principled and educated way of choosing the time steps in the annealing path. We demonstrate in multiple optimal control applications that our novel method can improve performance significantly, including tasks in diffusion-based sampling, transition path sampling, and fine-tuning of diffusion models.

Paper Structure

This paper contains 89 sections, 13 theorems, 156 equations, 13 figures, 1 table, 2 algorithms.

Key Result

Proposition 2.2

Let $\mathbbm{Q}$ be the optimal path measure defined in eq:opt_change_of_measure. The intermediate optimal path measures corresponding to eq: constrained optimization then satisfyFor notational convenience we assume an $X_0$-independent normalizing constant here and hereafter, which is possible whe and the optimal change of measure w.r.t. the base measure $\mathbbm{P}$ is given byAs usual, the em

Figures (13)

  • Figure 1: Illustration of a sequence of distributions $(\mathbbm{P}^{u_i})_i$ resulting from our trust region method (orange) and a measure transport corresponding to non-equispaced geometric annealing (blue), leading to high variance in the importance weights for the initial steps.
  • Figure 2: Performance criteria for a Gaussian mixture target density with varying dimension $d$, averaged across four seeds. We show the errors of estimating the optimal control, the log-normalizing constant, as well as the Sinkhorn and total variation distances over different dimensions (from left to right). We observe that our trust region methods (TR-SOCM and TR-LV) are the only methods that perform well in high dimensions.
  • Figure 3: The left table reports $|\Delta \log \mathcal{Z}|$ values for the Many Well target across different dimensions $d$. The middle plot compares the log-variance loss of our trust region method (TR-LV) with that of Langevin preconditioning on the GMM target in dimension $d = 100$. The rightmost figure presents an ablation analysis of key components in our method, highlighting the importance of trust regions in preventing mode collapse and achieving low control error. All results are averaged across four seeds.
  • Figure 4: We compare our trust region method (TR-LV) with Diffusion Path Sampler (TPS-DPS) seong2024transition on Alanine Dipeptide and Chignolin. All results are averaged over three random seeds, with both the mean and standard deviation reported. Our method identifies transition paths more consistently and robustly, as evidenced by higher THP values and lower standard deviations.
  • Figure 5: Comparison of Adjoint Matching against Trust Region SOCM for Stable Diffusion 1.5 fine-tuning w.r.t. four quality metrics, where $\eta = 0$ and $\eta = 1$ refer to ODE (DDIM) and SDE (DDPM) inference, respectively.
  • ...and 8 more figures

Theorems & Definitions (33)

  • Remark 2.1: Controlling the variance of importance weights
  • Proposition 2.2: Optimal change of measure as geometric annealing
  • proof
  • Proposition 2.3: Equidistant steps on statistical manifold
  • proof
  • Remark 2.4: Trust regions for general measures
  • Proposition 2.5: Optimality for trust region SOC problems
  • proof
  • Corollary 3.1: Sampling from tilted distributions
  • proof
  • ...and 23 more