Semantically Safe Robot Manipulation: From Semantic Scene Understanding to Motion Safeguards
Lukas Brunke, Yanni Zhang, Ralf Römer, Jack Naimer, Nikola Staykov, Siqi Zhou, Angela P. Schoellig
TL;DR
This paper presents a semantic safety filter that integrates semantic scene understanding and LLM-driven reasoning with formal safety guarantees to enable safe robot manipulation in human-centric environments. By building an open-vocabulary 3D semantic map, synthesizing constraints across spatial, behavioral, and pose dimensions, and enforcing them through a control-barrier-certification framework, the approach certifies high-level commands and learned policies in real time. The method demonstrates safety benefits beyond collision avoidance in tabletop and real kitchen experiments, including zero semantic-constraint violations and substantial reductions in end-effector rotation when semantic constraints call for caution. Overall, the work provides a practical pathway to incorporate common-sense semantics into rigorous safety certificates for robust, human-aware robotics.
Abstract
Ensuring safe interactions in human-centric environments requires robots to understand and adhere to constraints recognized by humans as "common sense" (e.g., "moving a cup of water above a laptop is unsafe as the water may spill" or "rotating a cup of water is unsafe as it can lead to pouring its content"). Recent advances in computer vision and machine learning have enabled robots to acquire a semantic understanding of and reason about their operating environments. While extensive literature on safe robot decision-making exists, semantic understanding is rarely integrated into these formulations. In this work, we propose a semantic safety filter framework to certify robot inputs with respect to semantically defined constraints (e.g., unsafe spatial relationships, behaviors, and poses) and geometrically defined constraints (e.g., environment-collision and self-collision constraints). In our proposed approach, given perception inputs, we build a semantic map of the 3D environment and leverage the contextual reasoning capabilities of large language models to infer semantically unsafe conditions. These semantically unsafe conditions are then mapped to safe actions through a control barrier certification formulation. We demonstrate the proposed semantic safety filter in teleoperated manipulation tasks and with learned diffusion policies applied in a real-world kitchen environment that further showcases its effectiveness in addressing practical semantic safety constraints. Together, these experiments highlight our approach's capability to integrate semantics into safety certification, enabling safe robot operation beyond traditional collision avoidance.
