Table of Contents
Fetching ...

How to Train Your Latent Control Barrier Function: Smooth Safety Filtering Under Hard-to-Model Constraints

Kensuke Nakamura, Arun L. Bishop, Steven Man, Aaron M. Johnson, Zachary Manchester, Andrea Bajcsy

TL;DR

This work tackles the challenge of safe visuomotor control when models are hard to learn and perception is high-dimensional. It reveals two key gaps: non-smooth margins from classifier-based failure encodings and a distribution mismatch between safety training data and deployment actions, which hinder smooth, optimization-based safety filtering. To address this, the authors introduce LatentCBF, combining gradient-penalized, Lipschitz-constrained margin learning inspired by Wasserstein disentanglement with a mixed-policy RL training regimen, enabling a latent, discrete-time CBF that can modulate a nominal policy in real time. Empirical results in both simulation and hardware demonstrate that LatentCBF yields smoother safety interventions and substantially improves safe-task success (e.g., 80% vs 38% in hardware) while maintaining high safety rates, and can scale to thousands of candidate actions with ~10 ms latency, offering practical safety for vision-based manipulation without hand-crafted models or full state observability.

Abstract

Latent safety filters extend Hamilton-Jacobi (HJ) reachability to operate on latent state representations and dynamics learned directly from high-dimensional observations, enabling safe visuomotor control under hard-to-model constraints. However, existing methods implement "least-restrictive" filtering that discretely switch between nominal and safety policies, potentially undermining the task performance that makes modern visuomotor policies valuable. While reachability value functions can, in principle, be adapted to be control barrier functions (CBFs) for smooth optimization-based filtering, we theoretically and empirically show that current latent-space learning methods produce fundamentally incompatible value functions. We identify two sources of incompatibility: First, in HJ reachability, failures are encoded via a "margin function" in latent space, whose sign indicates whether or not a latent is in the constraint set. However, representing the margin function as a classifier yields saturated value functions that exhibit discontinuous jumps. We prove that the value function's Lipschitz constant scales linearly with the margin function's Lipschitz constant, revealing that smooth CBFs require smooth margins. Second, reinforcement learning (RL) approximations trained solely on safety policy data yield inaccurate value estimates for nominal policy actions, precisely where CBF filtering needs them. We propose the LatentCBF, which addresses both challenges through gradient penalties that lead to smooth margin functions without additional labeling, and a value-training procedure that mixes data from both nominal and safety policy distributions. Experiments on simulated benchmarks and hardware with a vision-based manipulation policy demonstrate that LatentCBF enables smooth safety filtering while doubling the task-completion rate over prior switching methods.

How to Train Your Latent Control Barrier Function: Smooth Safety Filtering Under Hard-to-Model Constraints

TL;DR

This work tackles the challenge of safe visuomotor control when models are hard to learn and perception is high-dimensional. It reveals two key gaps: non-smooth margins from classifier-based failure encodings and a distribution mismatch between safety training data and deployment actions, which hinder smooth, optimization-based safety filtering. To address this, the authors introduce LatentCBF, combining gradient-penalized, Lipschitz-constrained margin learning inspired by Wasserstein disentanglement with a mixed-policy RL training regimen, enabling a latent, discrete-time CBF that can modulate a nominal policy in real time. Empirical results in both simulation and hardware demonstrate that LatentCBF yields smoother safety interventions and substantially improves safe-task success (e.g., 80% vs 38% in hardware) while maintaining high safety rates, and can scale to thousands of candidate actions with ~10 ms latency, offering practical safety for vision-based manipulation without hand-crafted models or full state observability.

Abstract

Latent safety filters extend Hamilton-Jacobi (HJ) reachability to operate on latent state representations and dynamics learned directly from high-dimensional observations, enabling safe visuomotor control under hard-to-model constraints. However, existing methods implement "least-restrictive" filtering that discretely switch between nominal and safety policies, potentially undermining the task performance that makes modern visuomotor policies valuable. While reachability value functions can, in principle, be adapted to be control barrier functions (CBFs) for smooth optimization-based filtering, we theoretically and empirically show that current latent-space learning methods produce fundamentally incompatible value functions. We identify two sources of incompatibility: First, in HJ reachability, failures are encoded via a "margin function" in latent space, whose sign indicates whether or not a latent is in the constraint set. However, representing the margin function as a classifier yields saturated value functions that exhibit discontinuous jumps. We prove that the value function's Lipschitz constant scales linearly with the margin function's Lipschitz constant, revealing that smooth CBFs require smooth margins. Second, reinforcement learning (RL) approximations trained solely on safety policy data yield inaccurate value estimates for nominal policy actions, precisely where CBF filtering needs them. We propose the LatentCBF, which addresses both challenges through gradient penalties that lead to smooth margin functions without additional labeling, and a value-training procedure that mixes data from both nominal and safety policy distributions. Experiments on simulated benchmarks and hardware with a vision-based manipulation policy demonstrate that LatentCBF enables smooth safety filtering while doubling the task-completion rate over prior switching methods.

Paper Structure

This paper contains 18 sections, 2 theorems, 19 equations, 3 figures, 10 tables.

Key Result

theorem 1

Let the margin function $\ell(s)$ and time discounted HJ value function $V^{\text{\tiny{*}}}(s)$ be Lipschitz continuous with constants $L_\ell$ and $L_{V^{\text{\tiny{*}}}}$, respectively. Let the discrete-time dynamics $f(s, a)$ be uniformly Lipschitz in $s$ with constant $L_f$ such that for a fix

Figures (3)

  • Figure 1: Safety Filtering a Visuomotor Manipulation Policy. Both the nominal policy and the safety filters take as input the RGB images shown on the bottom. (left) Unfiltered nominal diffusion policy spills the bag's contents. (center) Least-restrictive latent safety filter prevents spilling, but also stops the diffusion policy from lifting the bag. (right) Our LatentCBF guides the diffusion policy to a safe grasp and it completes the pickup task.
  • Figure 2: CBFs as a Function of the Margin Function. Even with a perfect model, a classifier-based $\ell(s)$ yields a CBF with poor signal during action filtering (left). A smooth margin function provides a rich signal for the CBF to evaluate alternative actions (right).
  • Figure 3: Hardware: End-effector Trajectories in the X-Z Plane.LatentCBF maintains control authority and safety compared to LR and unfiltered baselines. Without out fitting the critic as in \ref{['sec:how-to-train']}, LatentCBF-NoMix permits erratic actions and spills in 20$\%$ of trials.

Theorems & Definitions (4)

  • definition 1: Discrete-time Control Barrier Function agrawal2017discrete
  • definition 2: Lipschitz Continuity
  • theorem 1: Margin-to-Value Lipschitz Bound
  • theorem 2: Margin-to-Value Lipschitz Bound