Deterministic Trajectory Optimization through Probabilistic Optimal Control

Mohammad Mahmoudi Filabadi; Tom Lefebvre; Guillaume Crevecoeur

Deterministic Trajectory Optimization through Probabilistic Optimal Control

Mohammad Mahmoudi Filabadi, Tom Lefebvre, Guillaume Crevecoeur

TL;DR

The paper reframes deterministic trajectory optimization as a probabilistic inference problem under the risk-sensitive objective, enabling EM-based fixed-point iterations that yield probabilistic policies converging to the deterministic optimum. It introduces two derivative-free algorithms, SP-PDP and SP-BSC, that rely on sigma-point uncertainty propagation and Fourier-Hermite expansions to evaluate and update policies without gradients. Through backward-forward (SP-PDP) and forward-backward (SP-BSC) schemes, the methods balance exploration and exploitation, improving stability and convergence on nonlinear, high-dimensional systems. The approach demonstrates favorable performance on pendulum, cart-pole, and 6-DoF planning tasks, with automatic regularization via the prior policy and a tunable risk parameter that governs exploration intensity, suggesting practical advantages for robust trajectory optimization.

Abstract

In this article, we discuss two algorithms tailored to discrete-time deterministic finite-horizon nonlinear optimal control problems or so-called deterministic trajectory optimization problems. Both algorithms can be derived from an emerging theoretical paradigm that we refer to as probabilistic optimal control. The paradigm reformulates stochastic optimal control as an equivalent probabilistic inference problem and can be viewed as a generalisation of the former. The merit of this perspective is that it allows to address the problem using the Expectation-Maximization algorithm. It is shown that the application of this algorithm results in a fixed point iteration of probabilistic policies that converge to the deterministic optimal policy. Two strategies for policy evaluation are discussed, using state-of-the-art uncertainty quantification methods resulting into two distinct algorithms. The algorithms are structurally closest related to the differential dynamic programming algorithm and related methods that use sigma-point methods to avoid direct gradient evaluations. The main advantage of the algorithms is an improved balance between exploration and exploitation over the iterations, leading to improved numerical stability and accelerated convergence. These properties are demonstrated on different nonlinear systems.

Deterministic Trajectory Optimization through Probabilistic Optimal Control

TL;DR

Abstract

Paper Structure (26 sections, 11 theorems, 87 equations, 5 figures, 3 tables, 2 algorithms)

This paper contains 26 sections, 11 theorems, 87 equations, 5 figures, 3 tables, 2 algorithms.

Introduction
Background
Notation
Problem formulation
Dynamic programming
Differential dynamic programming
Probabilistic Optimal Control
Problem reformulation
Equivalent Maximum Likelihood Estimation problem
Expectation-Maximization
Evaluation of the optimal policies
Probabilistic Dynamic Programming
Bayesian Smoothing Control
Numerical implementation
Sigma-Point Probabilistic Dynamic Programming
...and 11 more sections

Key Result

Lemma 1

Problem (eq:general_min_cost_functional) and (eq:risk_sensitivity_objective) are equivalent for deterministic dynamics regardless of the value of $\gamma\in\mathbb{R}^+_*$.

Figures (5)

Figure 1: The probabilistic model used to represent the optimal control problem. White-shaded variables are latent or hidden.
Figure 2: Total cost per iteration for the pendulum swing-up experiment. In (a), the results of SP-PDP are compared to the SP-DP using the UT3, UT5, and GH cubature rules. In (b), the results of SP-BSC using the UT3, UT5, and GH cubature rules are compared with the best results obtained with SP-DP.
Figure 3: The optimized trajectory and the sampled points of 60 trajectory rollouts in the state-space at each iteration of the SP-PDP using the GH cubature rule.
Figure 4: Total cost per iteration for (a) the cart-pole swing-up experiment and (b) the 6-DoF robot motion planning experiment.
Figure 5: Visualisation of the convergence of the robot joint angle trajectories with the SP-BSC method using the UT3 rule. The dotted, solid, and dashed lines indicate the first iteration, intermediate iterations, and the 150th iteration, respectively.

Theorems & Definitions (35)

Remark 1
Lemma 1
proof
Remark 2
Theorem 1
proof
Lemma 2
proof
Lemma 3
proof
...and 25 more

Deterministic Trajectory Optimization through Probabilistic Optimal Control

TL;DR

Abstract

Deterministic Trajectory Optimization through Probabilistic Optimal Control

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (35)