Table of Contents
Fetching ...

A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents

Yuting Huang, Leilei Ding, Zhipeng Tang, Tianfu Wang, Xinrui Lin, Wuyang Zhang, Mingxiao Ma, Yanyong Zhang

TL;DR

The paper addresses safety hazards that arise during LLM-driven embodied task planning and presents Safe-BeAl, a dual framework comprising SafePlan-Bench (for comprehensive safety benchmarking) and Safe-Align (for aligning agents with physical-world safety knowledge). SafePlan-Bench formalizes embodied task-planning safety via Process and Termination constraints, builds the SafeRisks hazard dataset (2,027 samples across 8 hazard categories), and integrates a safety detector with VirtualHome to jointly evaluate safety and task success. Safe-Align introduces atomic-action level alignment using a weighted Bradley–Terry–inspired objective to emphasize error-prone steps while preserving planning performance, trained on a paired safe/unsafe action dataset. Across multiple embodied baselines, Safe-BeAl improves safety by 8.55–15.22% over GPT-4 baselines while maintaining task completion, demonstrating a practical pathway to safer real-world deployment of LLM-based embodied agents.

Abstract

Large Language Models (LLMs) exhibit substantial promise in enhancing task-planning capabilities within embodied agents due to their advanced reasoning and comprehension. However, the systemic safety of these agents remains an underexplored frontier. In this study, we present Safe-BeAl, an integrated framework for the measurement (SafePlan-Bench) and alignment (Safe-Align) of LLM-based embodied agents' behaviors. SafePlan-Bench establishes a comprehensive benchmark for evaluating task-planning safety, encompassing 2,027 daily tasks and corresponding environments distributed across 8 distinct hazard categories (e.g., Fire Hazard). Our empirical analysis reveals that even in the absence of adversarial inputs or malicious intent, LLM-based agents can exhibit unsafe behaviors. To mitigate these hazards, we propose Safe-Align, a method designed to integrate physical-world safety knowledge into LLM-based embodied agents while maintaining task-specific performance. Experiments across a variety of settings demonstrate that Safe-BeAl provides comprehensive safety validation, improving safety by 8.55 - 15.22%, compared to embodied agents based on GPT-4, while ensuring successful task completion.

A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents

TL;DR

The paper addresses safety hazards that arise during LLM-driven embodied task planning and presents Safe-BeAl, a dual framework comprising SafePlan-Bench (for comprehensive safety benchmarking) and Safe-Align (for aligning agents with physical-world safety knowledge). SafePlan-Bench formalizes embodied task-planning safety via Process and Termination constraints, builds the SafeRisks hazard dataset (2,027 samples across 8 hazard categories), and integrates a safety detector with VirtualHome to jointly evaluate safety and task success. Safe-Align introduces atomic-action level alignment using a weighted Bradley–Terry–inspired objective to emphasize error-prone steps while preserving planning performance, trained on a paired safe/unsafe action dataset. Across multiple embodied baselines, Safe-BeAl improves safety by 8.55–15.22% over GPT-4 baselines while maintaining task completion, demonstrating a practical pathway to safer real-world deployment of LLM-based embodied agents.

Abstract

Large Language Models (LLMs) exhibit substantial promise in enhancing task-planning capabilities within embodied agents due to their advanced reasoning and comprehension. However, the systemic safety of these agents remains an underexplored frontier. In this study, we present Safe-BeAl, an integrated framework for the measurement (SafePlan-Bench) and alignment (Safe-Align) of LLM-based embodied agents' behaviors. SafePlan-Bench establishes a comprehensive benchmark for evaluating task-planning safety, encompassing 2,027 daily tasks and corresponding environments distributed across 8 distinct hazard categories (e.g., Fire Hazard). Our empirical analysis reveals that even in the absence of adversarial inputs or malicious intent, LLM-based agents can exhibit unsafe behaviors. To mitigate these hazards, we propose Safe-Align, a method designed to integrate physical-world safety knowledge into LLM-based embodied agents while maintaining task-specific performance. Experiments across a variety of settings demonstrate that Safe-BeAl provides comprehensive safety validation, improving safety by 8.55 - 15.22%, compared to embodied agents based on GPT-4, while ensuring successful task completion.

Paper Structure

This paper contains 30 sections, 18 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Three cases that pose safety hazards. (Top) Task: organize the groceries. The agent places too many groceries on the cooktop, which may lead to a Fire Hazard. (Middle) Task: cook some food. The agent completes the cooking task, but the stove remains left on, posing a potential Fire Hazard. (Bottom) Task: clean the floor. The agent cleans the floor, but water stains remain, posing a potential fall hazard.
  • Figure 2: (a) (Left) The definition of Embodied Task-Planning Safety based on two constraints: Process Safety Constraints and Termination Safety Constraints. (Middle) The overall data generation pipeline of SafePlan-Bench. (Right) The Safety Evaluation method. (b) (Left) An example of an embodied agent causing a safety hazard. (Right) The overall framework of Safe-Align reveals that it treats each atomic action as an optimization unit, focusing on learning from erroneous actions.
  • Figure 3: An example of task planning. For the task "cook some food", the agent executes a series of actions that ultimately alter the environmental state. While the task is successfully completed, it introduces several safety hazards, such as Fire Hazard
  • Figure 4: Distribution of 8 types of safety hazards in SafeRisks, along with the top 5 most common objects associated with each hazard type.
  • Figure 5: The counts of violations on Process Safety and Termination Safety Constraints across all methods.
  • ...and 5 more figures

Theorems & Definitions (3)

  • Definition 1: Process Safety Constraints
  • Definition 2: Termination Safety Constraints
  • Definition 3: Task-Planning Safety