Table of Contents
Fetching ...

Updating Robot Safety Representations Online from Natural Language Feedback

Leonardo Santos, Zirui Li, Lasse Peters, Somil Bansal, Andrea Bajcsy

TL;DR

This work uses vision language models to interpret language feedback and the robot's image observations to continuously update the robot's representation of safety constraints, and demonstrates the robot's ability to infer and respect language-based safety constraints with the proposed approach.

Abstract

Robots must operate safely when deployed in novel and human-centered environments, like homes. Current safe control approaches typically assume that the safety constraints are known a priori, and thus, the robot can pre-compute a corresponding safety controller. While this may make sense for some safety constraints (e.g., avoiding collision with walls by analyzing a floor plan), other constraints are more complex (e.g., spills), inherently personal, context-dependent, and can only be identified at deployment time when the robot is interacting in a specific environment and with a specific person (e.g., fragile objects, expensive rugs). Here, language provides a flexible mechanism to communicate these evolving safety constraints to the robot. In this work, we use vision language models (VLMs) to interpret language feedback and the robot's image observations to continuously update the robot's representation of safety constraints. With these inferred constraints, we update a Hamilton-Jacobi reachability safety controller online via efficient warm-starting techniques. Through simulation and hardware experiments, we demonstrate the robot's ability to infer and respect language-based safety constraints with the proposed approach.

Updating Robot Safety Representations Online from Natural Language Feedback

TL;DR

This work uses vision language models to interpret language feedback and the robot's image observations to continuously update the robot's representation of safety constraints, and demonstrates the robot's ability to infer and respect language-based safety constraints with the proposed approach.

Abstract

Robots must operate safely when deployed in novel and human-centered environments, like homes. Current safe control approaches typically assume that the safety constraints are known a priori, and thus, the robot can pre-compute a corresponding safety controller. While this may make sense for some safety constraints (e.g., avoiding collision with walls by analyzing a floor plan), other constraints are more complex (e.g., spills), inherently personal, context-dependent, and can only be identified at deployment time when the robot is interacting in a specific environment and with a specific person (e.g., fragile objects, expensive rugs). Here, language provides a flexible mechanism to communicate these evolving safety constraints to the robot. In this work, we use vision language models (VLMs) to interpret language feedback and the robot's image observations to continuously update the robot's representation of safety constraints. With these inferred constraints, we update a Hamilton-Jacobi reachability safety controller online via efficient warm-starting techniques. Through simulation and hardware experiments, we demonstrate the robot's ability to infer and respect language-based safety constraints with the proposed approach.
Paper Structure (12 sections, 9 equations, 5 figures, 2 tables)

This paper contains 12 sections, 9 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Natural language provides an intuitive interface for people to specify constraints they care about online, like restricted areas behind caution tape or coffee spills. We leverage advances in vision-language models to interpret multimodal language and image data, infer semantically-meaningful constraints, and update robot safety controllers online. Video results and code at the project website: https://cmu-intentlab.github.io/language-informed-safe-navigation/.
  • Figure 2: Updating Robot Safety Representations Online from Language Feedback. (left) Offline, the robot has an initial failure set ($\hat{\mathcal{F}}^{,0}_E$) and computes the corresponding safe set (${\mathcal{S}^\text{\tiny{*}}}^{,0}$) and safety policy (${\pi^{\text{\tiny{*}}}_\mathcal{R}}^{,0}$). (right) Online, the person describes their semantic constraint. Using a vision-language model, the robot converts the language-image data into a new failure set. This, along with the previously-computed safe set, are used to efficiently update the safety filter that shields the robot.
  • Figure 3: Simulation: Closed-Loop Behavior. (left) Two simulated scenes from HSSD-HAB dataset minderer2022simple, the final physical and semantic failure set and corresponding unsafe set, and the closed-loop trajectories of all methods. (right) Failure set inference accuracy as function of language command. Metrics compare the ground-truth failure $\mathcal{F}^*_E$ set and the inferred failure $\hat{\mathcal{F}}^T_E$.
  • Figure 4: Simulation: Language Timing. Our Safe-Lang method is more robust to feedback timing than Plan-Lang.
  • Figure 5: Hardware: Closed-Loop Motion. Without semantic constraints, Plan-SLAM cuts through the caution tape zone. Safe-Lang respects both the physical and semantic constraints.