Table of Contents
Fetching ...

Safety Guardrails for LLM-Enabled Robots

Zachary Ravichandran, Alexander Robey, Vijay Kumar, George J. Pappas, Hamed Hassani

TL;DR

This paper addresses safety risks in LLM-enabled robots, particularly adversarial jailbreaking that can cause physical harm. It proposes RoboGuard, a two-stage guardrail combining a root-of-trust LLM with chain-of-thought reasoning to-ground high-level safety rules into $LTL$ specifications, and a formal control-synthesis stage that ensures any LLM plan satisfies safety constraints via a Buchi-automaton-based check. The approach demonstrates substantial reductions in unsafe behavior (from $92\%$ to $<2.5\%$) in both simulation and real-world experiments, while maintaining performance on safe tasks and showing resilience to adaptive attacks. The work contributes a general, context-aware safeguard that is resource-efficient and adaptable to different robot platforms and planning architectures, with practical implications for safer deployment of AI-enabled robotics in open-world settings.

Abstract

Although the integration of large language models (LLMs) into robotics has unlocked transformative capabilities, it has also introduced significant safety concerns, ranging from average-case LLM errors (e.g., hallucinations) to adversarial jailbreaking attacks, which can produce harmful robot behavior in real-world settings. Traditional robot safety approaches do not address the novel vulnerabilities of LLMs, and current LLM safety guardrails overlook the physical risks posed by robots operating in dynamic real-world environments. In this paper, we propose RoboGuard, a two-stage guardrail architecture to ensure the safety of LLM-enabled robots. RoboGuard first contextualizes pre-defined safety rules by grounding them in the robot's environment using a root-of-trust LLM, which employs chain-of-thought (CoT) reasoning to generate rigorous safety specifications, such as temporal logic constraints. RoboGuard then resolves potential conflicts between these contextual safety specifications and a possibly unsafe plan using temporal logic control synthesis, which ensures safety compliance while minimally violating user preferences. Through extensive simulation and real-world experiments that consider worst-case jailbreaking attacks, we demonstrate that RoboGuard reduces the execution of unsafe plans from 92% to below 2.5% without compromising performance on safe plans. We also demonstrate that RoboGuard is resource-efficient, robust against adaptive attacks, and significantly enhanced by enabling its root-of-trust LLM to perform CoT reasoning. These results underscore the potential of RoboGuard to mitigate the safety risks and enhance the reliability of LLM-enabled robots.

Safety Guardrails for LLM-Enabled Robots

TL;DR

This paper addresses safety risks in LLM-enabled robots, particularly adversarial jailbreaking that can cause physical harm. It proposes RoboGuard, a two-stage guardrail combining a root-of-trust LLM with chain-of-thought reasoning to-ground high-level safety rules into specifications, and a formal control-synthesis stage that ensures any LLM plan satisfies safety constraints via a Buchi-automaton-based check. The approach demonstrates substantial reductions in unsafe behavior (from to ) in both simulation and real-world experiments, while maintaining performance on safe tasks and showing resilience to adaptive attacks. The work contributes a general, context-aware safeguard that is resource-efficient and adaptable to different robot platforms and planning architectures, with practical implications for safer deployment of AI-enabled robotics in open-world settings.

Abstract

Although the integration of large language models (LLMs) into robotics has unlocked transformative capabilities, it has also introduced significant safety concerns, ranging from average-case LLM errors (e.g., hallucinations) to adversarial jailbreaking attacks, which can produce harmful robot behavior in real-world settings. Traditional robot safety approaches do not address the novel vulnerabilities of LLMs, and current LLM safety guardrails overlook the physical risks posed by robots operating in dynamic real-world environments. In this paper, we propose RoboGuard, a two-stage guardrail architecture to ensure the safety of LLM-enabled robots. RoboGuard first contextualizes pre-defined safety rules by grounding them in the robot's environment using a root-of-trust LLM, which employs chain-of-thought (CoT) reasoning to generate rigorous safety specifications, such as temporal logic constraints. RoboGuard then resolves potential conflicts between these contextual safety specifications and a possibly unsafe plan using temporal logic control synthesis, which ensures safety compliance while minimally violating user preferences. Through extensive simulation and real-world experiments that consider worst-case jailbreaking attacks, we demonstrate that RoboGuard reduces the execution of unsafe plans from 92% to below 2.5% without compromising performance on safe plans. We also demonstrate that RoboGuard is resource-efficient, robust against adaptive attacks, and significantly enhanced by enabling its root-of-trust LLM to perform CoT reasoning. These results underscore the potential of RoboGuard to mitigate the safety risks and enhance the reliability of LLM-enabled robots.

Paper Structure

This paper contains 29 sections, 3 equations, 14 figures, 7 tables, 1 algorithm.

Figures (14)

  • Figure 1: Overview of RoboGuard. Online, a system designer first configures RoboGuard with safety rules and a robot description (A). Online, RoboGuard first receives the robot's world model, and it uses this world model to produce grounded safety specifications (B). Next, RoboGuard synthesizes these specifications with the LLM-generated plan, in a manner that ensures safety while maximally respecting the proposed plan (C).
  • Figure 2: RoboGuard comprises two modules, the contextual grounding module and the control synthesis module. The contextual grounding module is configured offline with safety rules and a robot description. Online, it reasons over robot context, as provided by the world model, to generate safety specifications. The control synthesis module uses these specifications and the LLM-proposed plan, in order to synthesize a plan that maximally follows user preferences while ensuring safety.
  • Figure 3: Textual world model representation. We instantiate the world model as a semantic graph, which is provided as a JSON string to RoboGuard via an in-context prompt to the root-of-trust LLM
  • Figure 4: The contextual grounding module uses a root-of-trust LLM to generate grounded safety specifications given a configuration and world model. The LLM employs a chain-of-thought (CoT) reasoning process that enumerates provide safety rules, provides a short reason of how each rule could be respected in the world model, and a corresponding LTL specification. These specifications are then aggregated into a single expression.
  • Figure 5: Experimental environments. (Top) The floor of an office building. (Bottom) An office park. Semantics are randomly added and removed during simulation, requiring RoboGuard to reason over varying contexts.
  • ...and 9 more figures

Theorems & Definitions (2)

  • Example 4.1
  • Remark 4.2