Table of Contents
Fetching ...

Safe Policy Optimization via Control Barrier Function-based Safety Filters

Yiting Chen, Pol Mestres, Emiliano Dall'Anese, Jorge Cortés

Abstract

Control barrier function (CBF)-based safety filters provide a systematic way to enforce state constraints, but they can significantly alter the closed-loop dynamics induced by a nominal, stabilizing controller. In particular, the resulting safety-filtered system may exhibit undesirable behaviors including limit cycles, unbounded trajectories, and undesired equilibria. This paper develops a policy optimization framework to maximally enhance the stability properties of safety-filtered controllers. Focusing on linear systems with linear nominal controllers, we jointly parameterize the nominal feedback gain and safety-filter components, and optimize them using trajectory-based objectives computed from closed-loop rollouts. To ensure that the nominal controller remains stabilizing throughout training, we encode Lyapunov-based stability conditions as smooth scalar constraints and enforce them using robust safe gradient flow. This guarantees feasibility of the stability constraints along the optimization iterates and therefore avoids instability during training. Numerical experiments on obstacle-avoidance problems show that the proposed approach can remove asymptotically stable undesired equilibria and improve convergence behavior while maintaining forward invariance of the safe set.

Safe Policy Optimization via Control Barrier Function-based Safety Filters

Abstract

Control barrier function (CBF)-based safety filters provide a systematic way to enforce state constraints, but they can significantly alter the closed-loop dynamics induced by a nominal, stabilizing controller. In particular, the resulting safety-filtered system may exhibit undesirable behaviors including limit cycles, unbounded trajectories, and undesired equilibria. This paper develops a policy optimization framework to maximally enhance the stability properties of safety-filtered controllers. Focusing on linear systems with linear nominal controllers, we jointly parameterize the nominal feedback gain and safety-filter components, and optimize them using trajectory-based objectives computed from closed-loop rollouts. To ensure that the nominal controller remains stabilizing throughout training, we encode Lyapunov-based stability conditions as smooth scalar constraints and enforce them using robust safe gradient flow. This guarantees feasibility of the stability constraints along the optimization iterates and therefore avoids instability during training. Numerical experiments on obstacle-avoidance problems show that the proposed approach can remove asymptotically stable undesired equilibria and improve convergence behavior while maintaining forward invariance of the safe set.

Paper Structure

This paper contains 14 sections, 3 theorems, 40 equations, 3 figures, 1 algorithm.

Key Result

Lemma 1

Let $M\in \mathbb{R}^{n\times n}$ be a symmetric matrix. Then $M$ is positive definite if and only if all leading principal minors of $M$ are positive, i.e., $\det(M_{(1:i,\,1:i)}) > 0$, for all $i\in[n]$. $\Box$$\blacktriangleleft$$\blacktriangleleft$

Figures (3)

  • Figure F1: State trajectories generated by the initial (left) and trained (right) controllers. Gray regions indicate unsafe sets. Under the initial controller, two undesired equilibria appear on the boundary of the safe set, one of which is asymptotically stable and its region of attraction has a positive measure. After training, no undesired equilibrium is observed, and all trajectories remain within the safe set and converge to the origin.
  • Figure F2: State trajectories generated by the initial (left) and trained (right) controllers. Gray regions indicate unsafe sets. The initial controller induces an asymptotically stable undesirable equilibrium on the boundary of the obstacle, causing some trajectories to converge to the unsafe set. After training, the asymptotically stable undesirable equilibrium is eliminated, and the resulting controller keeps all trajectories within the safe set while yielding improved convergence behavior.
  • Figure F3: State trajectories generated by the initial (left) and trained (right) controllers. Gray regions indicate unsafe sets. Given a same set of initial conditions, the trajectories under the trained controller converge to the origin while ensuring obstacle avoidance, whereas many trajectories under the initial controller converge to undesirable equilibria located near the top-right corner and on the boundaries of the ellipsoid obstacles.

Theorems & Definitions (5)

  • Lemma 1: horn2012matrix
  • Remark 1
  • Proposition 1
  • proof
  • Lemma 2: SB-LV:09