Table of Contents
Fetching ...

Semantically Safe Robot Manipulation: From Semantic Scene Understanding to Motion Safeguards

Lukas Brunke, Yanni Zhang, Ralf Römer, Jack Naimer, Nikola Staykov, Siqi Zhou, Angela P. Schoellig

TL;DR

This paper presents a semantic safety filter that integrates semantic scene understanding and LLM-driven reasoning with formal safety guarantees to enable safe robot manipulation in human-centric environments. By building an open-vocabulary 3D semantic map, synthesizing constraints across spatial, behavioral, and pose dimensions, and enforcing them through a control-barrier-certification framework, the approach certifies high-level commands and learned policies in real time. The method demonstrates safety benefits beyond collision avoidance in tabletop and real kitchen experiments, including zero semantic-constraint violations and substantial reductions in end-effector rotation when semantic constraints call for caution. Overall, the work provides a practical pathway to incorporate common-sense semantics into rigorous safety certificates for robust, human-aware robotics.

Abstract

Ensuring safe interactions in human-centric environments requires robots to understand and adhere to constraints recognized by humans as "common sense" (e.g., "moving a cup of water above a laptop is unsafe as the water may spill" or "rotating a cup of water is unsafe as it can lead to pouring its content"). Recent advances in computer vision and machine learning have enabled robots to acquire a semantic understanding of and reason about their operating environments. While extensive literature on safe robot decision-making exists, semantic understanding is rarely integrated into these formulations. In this work, we propose a semantic safety filter framework to certify robot inputs with respect to semantically defined constraints (e.g., unsafe spatial relationships, behaviors, and poses) and geometrically defined constraints (e.g., environment-collision and self-collision constraints). In our proposed approach, given perception inputs, we build a semantic map of the 3D environment and leverage the contextual reasoning capabilities of large language models to infer semantically unsafe conditions. These semantically unsafe conditions are then mapped to safe actions through a control barrier certification formulation. We demonstrate the proposed semantic safety filter in teleoperated manipulation tasks and with learned diffusion policies applied in a real-world kitchen environment that further showcases its effectiveness in addressing practical semantic safety constraints. Together, these experiments highlight our approach's capability to integrate semantics into safety certification, enabling safe robot operation beyond traditional collision avoidance.

Semantically Safe Robot Manipulation: From Semantic Scene Understanding to Motion Safeguards

TL;DR

This paper presents a semantic safety filter that integrates semantic scene understanding and LLM-driven reasoning with formal safety guarantees to enable safe robot manipulation in human-centric environments. By building an open-vocabulary 3D semantic map, synthesizing constraints across spatial, behavioral, and pose dimensions, and enforcing them through a control-barrier-certification framework, the approach certifies high-level commands and learned policies in real time. The method demonstrates safety benefits beyond collision avoidance in tabletop and real kitchen experiments, including zero semantic-constraint violations and substantial reductions in end-effector rotation when semantic constraints call for caution. Overall, the work provides a practical pathway to incorporate common-sense semantics into rigorous safety certificates for robust, human-aware robotics.

Abstract

Ensuring safe interactions in human-centric environments requires robots to understand and adhere to constraints recognized by humans as "common sense" (e.g., "moving a cup of water above a laptop is unsafe as the water may spill" or "rotating a cup of water is unsafe as it can lead to pouring its content"). Recent advances in computer vision and machine learning have enabled robots to acquire a semantic understanding of and reason about their operating environments. While extensive literature on safe robot decision-making exists, semantic understanding is rarely integrated into these formulations. In this work, we propose a semantic safety filter framework to certify robot inputs with respect to semantically defined constraints (e.g., unsafe spatial relationships, behaviors, and poses) and geometrically defined constraints (e.g., environment-collision and self-collision constraints). In our proposed approach, given perception inputs, we build a semantic map of the 3D environment and leverage the contextual reasoning capabilities of large language models to infer semantically unsafe conditions. These semantically unsafe conditions are then mapped to safe actions through a control barrier certification formulation. We demonstrate the proposed semantic safety filter in teleoperated manipulation tasks and with learned diffusion policies applied in a real-world kitchen environment that further showcases its effectiveness in addressing practical semantic safety constraints. Together, these experiments highlight our approach's capability to integrate semantics into safety certification, enabling safe robot operation beyond traditional collision avoidance.

Paper Structure

This paper contains 20 sections, 8 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: We propose a semantic safety filter framework that leverages semantic scene understanding and contextual reasoning capabilities of large language models to certify robot motions with "common sense" constraints. For example, if a manipulator is carrying a cup of water, our proposed semantic safety filter prevents moving the cup above a laptop in the environment to prevent potential spillage (top). On the contrary, if the robot is tasked to transport a dry sponge, it is allowed to move over a laptop (bottom). An overview of the work with experiment demonstration results can be found on our website https://utiasdsl.github.io/semantic-manipulation/ and in our short video https://tiny.cc/semantic-manipulation.
  • Figure 2: An overview of our proposed semantic safety filter framework. The perception module segments the visual input and builds a semantic world representation. The LLM is queried based on the list of semantic labels and the manipulated object. It outputs the semantic context $\mathcal{S}$, which contains a list of unsafe spatial relationship-based semantic constraints for each object in the scene, a list of behavioral-based semantic constraints, and a pose-based semantic constraint. The semantic context, together with the point clouds of the objects in the scene, are then used to define safe sets for our proposed semantic safety filter. Additionally, based on the semantic context, the safety filter's parameters are adapted, for example, to prevent end effector rotations or to approach certain objects more carefully. At each time step, a high-level uncertified command from a human operator or a motion policy is mapped to the joint velocity $\bm{u}_\text{cmd}$ through differential inverse kinematics, certified by the proposed semantic safety filter, and then sent to the robot system.
  • Figure 3: Examples of the environment collision and semantic constraints enforced by our proposed semantic safety filter. For each scene, environment collision constraints are generated based on the point clouds of individual objects while the semantic constraints are synthesized based on the point clouds and labels of individual objects as well as the semantic safety conditions from the LLM. The semantic safety conditions are further categorized into spatial relationship constraints (blue text), behavioral constraints (orange text), and end effector pose constraints (green text).
  • Figure 4: The level of caution determines how quickly the end effector approaches a safety constraint boundary. In the books scene, we increase caution by adjusting the class $\mathcal{K}_\infty$ function when holding a cup of water under the same semantic constraint during teleoperation. In the cautious case, the negative time derivatives remain below the red dashed line, satisfying the CBF condition. Since $\alpha_{\text{sem,c}} < \alpha_{\text{sem}}$, the end effector approaches the boundary more slowly. Note that the $y$-axis is inverted.
  • Figure 5: Demonstration of the active (inactive) rotation constraint when the robot is holding a cup of water (dry sponge) in the scene books. The distribution for the cup of water is skewed towards smaller angular velocities; an active rotation constraint (red) generally yields reduced end effector rotations as compared to the inactive case (blue).
  • ...and 1 more figures