From Words to Safety: Language-Conditioned Safety Filtering for Robot Navigation
Zeyuan Feng, Haimingyue Zhang, Somil Bansal
TL;DR
This work tackles semantic safety for language-conditioned robot navigation by introducing a modular framework that separates language understanding, perception grounding, and runtime safety enforcement. The Language Module converts natural-language safety instructions into structured configurations, the Perception Module grounds these constraints in persistent 3D representations by fusing open-vocabulary and panoptic segmentation, and the Safety Filter applies sampling-based MPC-style planning to enforce both semantic and geometric constraints with minimal deviation from nominal control. Key contributions include a flexible three-component architecture, a three-class constraint taxonomy with a JSON-style specification, a perception-grounded semantic failure set, and two parallel safety filters with real-time performance, validated in simulation and hardware. The framework demonstrates robust interpretation and enforcement of diverse language-specified constraints in dynamic environments, improving semantic safety without overly sacrificing task progress, thus enhancing trust and reliability in human-robot interaction. The approach enables open-world runtime monitoring via LLMs while maintaining auditable, actionable safety guarantees for navigation tasks.
Abstract
As robots become increasingly integrated into open-world, human-centered environments, their ability to interpret natural language instructions and adhere to safety constraints is critical for effective and trustworthy interaction. Existing approaches often focus on mapping language to reward functions instead of safety specifications or address only narrow constraint classes (e.g., obstacle avoidance), limiting their robustness and applicability. We propose a modular framework for language-conditioned safety in robot navigation. Our framework is composed of three core components: (1) a large language model (LLM)-based module that translates free-form instructions into structured safety specifications, (2) a perception module that grounds these specifications by maintaining object-level 3D representations of the environment, and (3) a model predictive control (MPC)-based safety filter that enforces both semantic and geometric constraints in real time. We evaluate the effectiveness of the proposed framework through both simulation studies and hardware experiments, demonstrating that it robustly interprets and enforces diverse language-specified constraints across a wide range of environments and scenarios.
