Table of Contents
Fetching ...

Dynamic Programming in Probability Spaces via Optimal Transport

Antonio Terpin, Nicolas Lanzetti, Florian Dörfler

TL;DR

This work addresses discrete-time finite-horizon optimal control where the state is a probability measure over a ground space. It demonstrates a separation principle in which the DP solution in probability spaces is constructed from a ground-space DP for individual agents and a single (multi-marginal) optimal transport problem that allocates agents to trajectories. The key contributions include a rigorous formulation linking J_k to a multi-marginal OT with cost j_k, constructive procedures for optimal state-input distributions, and conditions ensuring existence and computability (offline ground-space DP plus OT) with offline/online computation trade-offs. The results unify and extend prior fleet-steering approaches by showing they are special cases of the DP-in-probability-space framework and offer practical guidance for designing transport costs via learned cost-to-go terms. Through examples and proofs, the paper clarifies when the two-marginal simplifications suffice and where multi-marginal formulations are indispensable, highlighting implications for scalable multi-agent control and distribution steering.

Abstract

We study discrete-time finite-horizon optimal control problems in probability spaces, whereby the state of the system is a probability measure. We show that, in many instances, the solution of dynamic programming in probability spaces results from two ingredients: (i) the solution of dynamic programming in the "ground space" (i.e., the space on which the probability measures live) and (ii) the solution of an optimal transport problem. From a multi-agent control perspective, a separation principle holds: The "low-level control of the agents of the fleet" (how does one reach the destination?) and "fleet-level control" (who goes where?) are decoupled.

Dynamic Programming in Probability Spaces via Optimal Transport

TL;DR

This work addresses discrete-time finite-horizon optimal control where the state is a probability measure over a ground space. It demonstrates a separation principle in which the DP solution in probability spaces is constructed from a ground-space DP for individual agents and a single (multi-marginal) optimal transport problem that allocates agents to trajectories. The key contributions include a rigorous formulation linking J_k to a multi-marginal OT with cost j_k, constructive procedures for optimal state-input distributions, and conditions ensuring existence and computability (offline ground-space DP plus OT) with offline/online computation trade-offs. The results unify and extend prior fleet-steering approaches by showing they are special cases of the DP-in-probability-space framework and offer practical guidance for designing transport costs via learned cost-to-go terms. Through examples and proofs, the paper clarifies when the two-marginal simplifications suffice and where multi-marginal formulations are indispensable, highlighting implications for scalable multi-agent control and distribution steering.

Abstract

We study discrete-time finite-horizon optimal control problems in probability spaces, whereby the state of the system is a probability measure. We show that, in many instances, the solution of dynamic programming in probability spaces results from two ingredients: (i) the solution of dynamic programming in the "ground space" (i.e., the space on which the probability measures live) and (ii) the solution of an optimal transport problem. From a multi-agent control perspective, a separation principle holds: The "low-level control of the agents of the fleet" (how does one reach the destination?) and "fleet-level control" (who goes where?) are decoupled.
Paper Structure (32 sections, 7 theorems, 41 equations, 2 figures, 1 table)

This paper contains 32 sections, 7 theorems, 41 equations, 2 figures, 1 table.

Key Result

Theorem 4.1

\newlabeltheorem:lifting0 Consider the setting of problem:finitehorizon:rigorous. At every stage $k$, the following hold:

Figures (2)

  • Figure 1: Sensitivity of offline and online computational effort to the number of agents $M$ and states $|X_{}|$ for the acr:dpa in probability spaces (empty marker) and \ref{['corollary:lifting:ot']} (filled marker), with $|U_{}|, N, O$ being the number of actions, time-steps, and operations, respectively. We omit markers when numbers exceed the IEEE 754 floating point representation. The in probability spaces is infeasible already for a small number of agents. The recipe in \ref{['corollary:lifting:ot']}, instead, remains feasible for large fleet sizes.
  • Figure 1: Depiction of \ref{['example:split-necessary']}, \ref{['example:multi-marginal']}, and \ref{['example:lifting:localnoise']}.

Theorems & Definitions (26)

  • Example 1.1: Deterministic optimal control
  • Example 1.2: Distribution steering
  • Example 1.3: Large-scale multi-agent systems
  • Remark 2.1
  • Example 3.1: Robots in a grid
  • Example 3.2: Robots in a grid, continued
  • Remark 3.3
  • Example 3.4: Robots in a grid, continued
  • Example 3.5: Robots in a grid, continued
  • Definition 3.6: DPA
  • ...and 16 more