Differential Dynamic Programming for the Optimal Control Problem with an Ellipsoidal Target Set and Its Statistical Inference
Sungjun Eom, Gyunghoon Park
TL;DR
This paper extends differential dynamic programming to optimal control problems with an ellipsoidal target set by reformulating the target via the orthogonal projection onto the ellipsoid and applying a local smoothing to manage nondifferentiability. It introduces ETS-DDP, a method that preserves the DDP structure while using smoothed cost contributions $L^{\mathcal C}(x,u)$ and $\phi^{\mathcal C}(x)$ derived from $P_{\mathcal C}(x)$, and it couples this with a statistical mechanism to design ${\mathcal C}$ from expert demonstrations, using sample mean $\bar c$ and covariance $S$ and radius $r=\sqrt{\chi^2_{\alpha}(n)}$. A simulation on autonomous parking shows ETS-DDP yields admissible state trajectories more quickly than conventional point-target DDP, at the cost of some optimality. The approach has potential for real-time or low-communication control tasks where fast feasible trajectories are favored, and it invites extensions to non-ellipsoidal targets and other distance measures. All mathematical notation is presented with explicit $...$ delimiters.
Abstract
This work addresses an extended class of optimal control problems where a target for a system state has the form of an ellipsoid rather than a fixed, single point. As a computationally affordable method for resolving the extended problem, we present a revised version of the differential dynamic programming (DDP), termed the differential dynamic programming with ellipsoidal target set (ETS-DDP). To this end, the problem with an ellipsoidal target set is reformulated into an equivalent form with the orthogonal projection operator, yielding that the resulting cost functions turn out to be discontinuous at some points. As the DDP usually requires the differentiability of cost functions, in the ETS-DDP formulation we locally approximate the (nonsmooth) cost functions to smoothed ones near the path generated at the previous iteration, by utilizing the orthogonal projection operator. Moreover, a statistical inference method is also presented for designing the ellipsoidal target set, based on data on admissible target points collected by expert demonstrations. Via a simulation on autonomous parking of a vehicle, it is seen that the proposed ETS-DDP efficiently derives an admissible state trajectory while running much faster than the point-targeted DDP, at the expense of optimality.
