Dynamic Programming in Probability Spaces via Optimal Transport
Antonio Terpin, Nicolas Lanzetti, Florian Dörfler
TL;DR
This work addresses discrete-time finite-horizon optimal control where the state is a probability measure over a ground space. It demonstrates a separation principle in which the DP solution in probability spaces is constructed from a ground-space DP for individual agents and a single (multi-marginal) optimal transport problem that allocates agents to trajectories. The key contributions include a rigorous formulation linking J_k to a multi-marginal OT with cost j_k, constructive procedures for optimal state-input distributions, and conditions ensuring existence and computability (offline ground-space DP plus OT) with offline/online computation trade-offs. The results unify and extend prior fleet-steering approaches by showing they are special cases of the DP-in-probability-space framework and offer practical guidance for designing transport costs via learned cost-to-go terms. Through examples and proofs, the paper clarifies when the two-marginal simplifications suffice and where multi-marginal formulations are indispensable, highlighting implications for scalable multi-agent control and distribution steering.
Abstract
We study discrete-time finite-horizon optimal control problems in probability spaces, whereby the state of the system is a probability measure. We show that, in many instances, the solution of dynamic programming in probability spaces results from two ingredients: (i) the solution of dynamic programming in the "ground space" (i.e., the space on which the probability measures live) and (ii) the solution of an optimal transport problem. From a multi-agent control perspective, a separation principle holds: The "low-level control of the agents of the fleet" (how does one reach the destination?) and "fleet-level control" (who goes where?) are decoupled.
