Deterministic Trajectory Optimization through Probabilistic Optimal Control
Mohammad Mahmoudi Filabadi, Tom Lefebvre, Guillaume Crevecoeur
TL;DR
The paper reframes deterministic trajectory optimization as a probabilistic inference problem under the risk-sensitive objective, enabling EM-based fixed-point iterations that yield probabilistic policies converging to the deterministic optimum. It introduces two derivative-free algorithms, SP-PDP and SP-BSC, that rely on sigma-point uncertainty propagation and Fourier-Hermite expansions to evaluate and update policies without gradients. Through backward-forward (SP-PDP) and forward-backward (SP-BSC) schemes, the methods balance exploration and exploitation, improving stability and convergence on nonlinear, high-dimensional systems. The approach demonstrates favorable performance on pendulum, cart-pole, and 6-DoF planning tasks, with automatic regularization via the prior policy and a tunable risk parameter that governs exploration intensity, suggesting practical advantages for robust trajectory optimization.
Abstract
In this article, we discuss two algorithms tailored to discrete-time deterministic finite-horizon nonlinear optimal control problems or so-called deterministic trajectory optimization problems. Both algorithms can be derived from an emerging theoretical paradigm that we refer to as probabilistic optimal control. The paradigm reformulates stochastic optimal control as an equivalent probabilistic inference problem and can be viewed as a generalisation of the former. The merit of this perspective is that it allows to address the problem using the Expectation-Maximization algorithm. It is shown that the application of this algorithm results in a fixed point iteration of probabilistic policies that converge to the deterministic optimal policy. Two strategies for policy evaluation are discussed, using state-of-the-art uncertainty quantification methods resulting into two distinct algorithms. The algorithms are structurally closest related to the differential dynamic programming algorithm and related methods that use sigma-point methods to avoid direct gradient evaluations. The main advantage of the algorithms is an improved balance between exploration and exploitation over the iterations, leading to improved numerical stability and accelerated convergence. These properties are demonstrated on different nonlinear systems.
