Table of Contents
Fetching ...

Estimating Control Barriers from Offline Data

Hongzhan Yu, Seth Farrell, Ryo Yoshimitsu, Zhizhen Qin, Henrik I. Christensen, Sicun Gao

TL;DR

The paper tackles learning neural control barrier functions from offline, sparsely labeled data by introducing an offline framework where OOD-based annotation propagates information from limited labeled samples to unlabeled trajectories. It combines a rejection-based OOD detector, a maximally-safe actor to generate in-distribution controls, and a barrier-critic objective that enforces the CBF Lie-derivative condition, all while using a surrogate normalization to prevent training collapse. The approach is validated in simulation and on real hardware, achieving state-of-the-art dynamic obstacle avoidance with limited offline data and demonstrating safer, less conservative maneuvers compared to existing offline methods. The zero-superlevel set $\{x: B(x)\ge 0\}$ is maintained as a control-invariant region, enabling effective safety guarantees within the learned boundary.

Abstract

Learning-based methods for constructing control barrier functions (CBFs) are gaining popularity for ensuring safe robot control. A major limitation of existing methods is their reliance on extensive sampling over the state space or online system interaction in simulation. In this work we propose a novel framework for learning neural CBFs through a fixed, sparsely-labeled dataset collected prior to training. Our approach introduces new annotation techniques based on out-of-distribution analysis, enabling efficient knowledge propagation from the limited labeled data to the unlabeled data. We also eliminate the dependency on a high-performance expert controller, and allow multiple sub-optimal policies or even manual control during data collection. We evaluate the proposed method on real-world platforms. With limited amount of offline data, it achieves state-of-the-art performance for dynamic obstacle avoidance, demonstrating statistically safer and less conservative maneuvers compared to existing methods.

Estimating Control Barriers from Offline Data

TL;DR

The paper tackles learning neural control barrier functions from offline, sparsely labeled data by introducing an offline framework where OOD-based annotation propagates information from limited labeled samples to unlabeled trajectories. It combines a rejection-based OOD detector, a maximally-safe actor to generate in-distribution controls, and a barrier-critic objective that enforces the CBF Lie-derivative condition, all while using a surrogate normalization to prevent training collapse. The approach is validated in simulation and on real hardware, achieving state-of-the-art dynamic obstacle avoidance with limited offline data and demonstrating safer, less conservative maneuvers compared to existing offline methods. The zero-superlevel set is maintained as a control-invariant region, enabling effective safety guarantees within the learned boundary.

Abstract

Learning-based methods for constructing control barrier functions (CBFs) are gaining popularity for ensuring safe robot control. A major limitation of existing methods is their reliance on extensive sampling over the state space or online system interaction in simulation. In this work we propose a novel framework for learning neural CBFs through a fixed, sparsely-labeled dataset collected prior to training. Our approach introduces new annotation techniques based on out-of-distribution analysis, enabling efficient knowledge propagation from the limited labeled data to the unlabeled data. We also eliminate the dependency on a high-performance expert controller, and allow multiple sub-optimal policies or even manual control during data collection. We evaluate the proposed method on real-world platforms. With limited amount of offline data, it achieves state-of-the-art performance for dynamic obstacle avoidance, demonstrating statistically safer and less conservative maneuvers compared to existing methods.

Paper Structure

This paper contains 12 sections, 10 equations, 8 figures, 1 table, 2 algorithms.

Figures (8)

  • Figure 1: Visualizations on toy datasets, to illustrate the motivation for utilizing unlabeled data. (a) With sufficient labeled data, the model can accurately capture the safety boundary. (b) When labeled data is limited, the learned boundary often misclassifies the safe and unsafe regions of the system. (c) Unlabeled data is generally more accessible than labeled data. Our approach leverages unlabeled data, along with the limited labeled data, to capture the CBF landscape that best adhere to the constraints inherent in the data.
  • Figure 2: Overall learning pipeline of the proposed method. Our method utilize offline demonstrations - even directly collected from real-world platforms - to construct neural control barriers, ensuring their zero-superlevel sets are control-invariant. The red contours denote the learned zero level set, serving as the safety boundary the ego-robot must not cross. Optionally, we gather additional demonstrations to further refine the barrier.
  • Figure 3: Model component visualization with the proposed training flows. The optimization of the CBF requires the actor to derive the maximally-safe controls over which we enforce Lie derivative condition. The actor is optimized based on both CBF and rejection models, capturing the control that leads to the safest in-distribution state. Rejection model's training does not rely on other models.
  • Figure 4: Visualization of the learned CBF landscapes. Early in training, the safety boundary is primarily informed by the collision states from data, leading the robot to collide. As training progresses, the CBF model begins to approximate the safe region of the system. However, jittery robot motions are exhibited due to incomplete training. Once learning converges, the CBF model satisfies the Lyapunov condition over its Lie-derivatives, enabling the selection of more aggressive yet still safe controls.
  • Figure 5: Simulation experiments for static obstacle avoidance with different dynamics models of ego-robot. Evaluation metric is the mean success rate where we follow Algorithm \ref{['alg2::derive_control']} to derive the controls based on trained models, and perform evaluations over $100$ randomized scenarios. When collecting the safe trajectories, we employ the potential-field controller(s) with (a)fixed and (b)-(d)randomized parameters.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Definition 1: Control Barrier Functions ames2019control