Table of Contents
Fetching ...

Improved Exploration for Safety-Embedded Differential Dynamic Programming Using Tolerant Barrier States

Joshua E. Kuperman, Hassan Almubarak, Augustinos D. Saravanos, Evangelos A. Theodorou

TL;DR

The paper tackles safe trajectory optimization under state constraints by marrying barrier-based safety embedding with tolerant exploration. It introduces Tolerant DBaS (T-DBaS), a barrier construction $ ilde{B}$ that combines sigmoid and softplus terms to allow temporary constraint violations, while preserving informative gradients inside unsafe regions. Embedded into Differential Dynamic Programming (DDP) as T-DBaS-DDP, the approach retains convergence properties and shows improved exploration in non-convex environments, validated on a differential-drive robot, a quadrotor, and multi-robot hardware experiments, with competitive performance versus Augmented-Lagrangian DDP. The results indicate that T-DBaS achieves faster convergence and safer goal-reaching in challenging obstacle configurations, suggesting strong potential for online MPC and learning-based tuning of barrier parameters in uncertainty-rich settings.

Abstract

In this paper, we introduce Tolerant Discrete Barrier States (T-DBaS), a novel safety-embedding technique for trajectory optimization with enhanced exploratory capabilities. The proposed approach generalizes the standard discrete barrier state (DBaS) method by accommodating temporary constraint violation during the optimization process while still approximating its safety guarantees. Consequently, the proposed approach eliminates the DBaS's safe nominal trajectories assumption, while enhancing its exploration effectiveness for escaping local minima. Towards applying T-DBaS to safety-critical autonomous robotics, we combine it with Differential Dynamic Programming (DDP), leading to the proposed safe trajectory optimization method T-DBaS-DDP, which inherits the convergence and scalability properties of the solver. The effectiveness of the T-DBaS algorithm is verified on differential drive robot and quadrotor simulations. In addition, we compare against the classical DBaS-DDP as well as Augmented-Lagrangian DDP (AL-DDP) in extensive numerical comparisons that demonstrate the proposed method's competitive advantages. Finally, the applicability of the proposed approach is verified through hardware experiments on the Georgia Tech Robotarium platform.

Improved Exploration for Safety-Embedded Differential Dynamic Programming Using Tolerant Barrier States

TL;DR

The paper tackles safe trajectory optimization under state constraints by marrying barrier-based safety embedding with tolerant exploration. It introduces Tolerant DBaS (T-DBaS), a barrier construction that combines sigmoid and softplus terms to allow temporary constraint violations, while preserving informative gradients inside unsafe regions. Embedded into Differential Dynamic Programming (DDP) as T-DBaS-DDP, the approach retains convergence properties and shows improved exploration in non-convex environments, validated on a differential-drive robot, a quadrotor, and multi-robot hardware experiments, with competitive performance versus Augmented-Lagrangian DDP. The results indicate that T-DBaS achieves faster convergence and safer goal-reaching in challenging obstacle configurations, suggesting strong potential for online MPC and learning-based tuning of barrier parameters in uncertainty-rich settings.

Abstract

In this paper, we introduce Tolerant Discrete Barrier States (T-DBaS), a novel safety-embedding technique for trajectory optimization with enhanced exploratory capabilities. The proposed approach generalizes the standard discrete barrier state (DBaS) method by accommodating temporary constraint violation during the optimization process while still approximating its safety guarantees. Consequently, the proposed approach eliminates the DBaS's safe nominal trajectories assumption, while enhancing its exploration effectiveness for escaping local minima. Towards applying T-DBaS to safety-critical autonomous robotics, we combine it with Differential Dynamic Programming (DDP), leading to the proposed safe trajectory optimization method T-DBaS-DDP, which inherits the convergence and scalability properties of the solver. The effectiveness of the T-DBaS algorithm is verified on differential drive robot and quadrotor simulations. In addition, we compare against the classical DBaS-DDP as well as Augmented-Lagrangian DDP (AL-DDP) in extensive numerical comparisons that demonstrate the proposed method's competitive advantages. Finally, the applicability of the proposed approach is verified through hardware experiments on the Georgia Tech Robotarium platform.
Paper Structure (14 sections, 1 theorem, 18 equations, 12 figures)

This paper contains 14 sections, 1 theorem, 18 equations, 12 figures.

Key Result

Proposition 1

Under the control sequence $U(x)$, the safe set $\mathcal{S}$ is controlled forward invariant if and only if $\beta(x(0)) < \infty \Rightarrow \beta_k <\infty \ \forall k \in [1, T]$.

Figures (12)

  • Figure 1: The top figure shows an example of the tolerant barrier with the inverse and log barriers. The bottom figures show the barrier functions in 3D to better illustrate their gradients. The inverse barrier (left) and the tolerant barrier (right) treat the unsafe region (circle of $r=1$) differently with the z-axis denoting the value of the barrier. Note that the discontinuity on the classical barrier function causes numerical instability when presented with unsafe initial conditions, whereas the tolerant barrier smoothly penalizes unsafe starts while also approximating the classical barrier in the safe region.
  • Figure 2: Intermediate solutions for T-DBaS-DDP and DBaS-DDP. The former method avoids the local minima by taking advantage of the constraint gradient in the unsafe set, while the latter is unable to provide a trajectory that reaches to the goal.
  • Figure 3: Experiment of two teams of two agents per team switching positions and maintaining connectivity with their teams while avoiding obstacles using Georgia Tech Robotarium. The circles around the agents indicate the connectivity area of the agent and the dotted lines indicate the traveled trajectories. A video of the experiment can be found in https://youtu.be/9ZRBHZfjKPY.
  • Figure 6: Seconds per iteration with standard deviation for each algorithm collected on 10 differential drive scenarios per data point. Collected on a M1 processor.
  • Figure : a) Diff. Drive
  • ...and 7 more figures

Theorems & Definitions (3)

  • Definition 1
  • Proposition 1
  • Remark