Table of Contents
Fetching ...

MADRA: Multi-Agent Debate for Risk-Aware Embodied Planning

Junjian Wang, Lidan Zhao, Xi Sheryl Zhang

TL;DR

MADRA tackles safety in embodied task planning by replacing costly training-based risk methods with a training-free, multi-agent debate framework guided by a Critical Evaluator. The system integrates memory-augmented hierarchical planning and a self-evolution loop to continuously improve task success while maintaining strong safety—achieving over 90% unsafe-task rejection with low safe-task false positives across AI2-THOR and VirtualHome. A SafeAware-VH dataset of 800 household instructions supports safety evaluation, and experiments demonstrate robust generalization across diverse LLMs and environments. The work provides a scalable, model-agnostic module for trustworthy embodied agents, with practical implications for safer real-world deployment and future multimodal extensions.

Abstract

Ensuring the safety of embodied AI agents during task planning is critical for real-world deployment, especially in household environments where dangerous instructions pose significant risks. Existing methods often suffer from either high computational costs due to preference alignment training or over-rejection when using single-agent safety prompts. To address these limitations, we propose MADRA, a training-free Multi-Agent Debate Risk Assessment framework that leverages collective reasoning to enhance safety awareness without sacrificing task performance. MADRA employs multiple LLM-based agents to debate the safety of a given instruction, guided by a critical evaluator that scores responses based on logical soundness, risk identification, evidence quality, and clarity. Through iterative deliberation and consensus voting, MADRA significantly reduces false rejections while maintaining high sensitivity to dangerous tasks. Additionally, we introduce a hierarchical cognitive collaborative planning framework that integrates safety, memory, planning, and self-evolution mechanisms to improve task success rates through continuous learning. We also contribute SafeAware-VH, a benchmark dataset for safety-aware task planning in VirtualHome, containing 800 annotated instructions. Extensive experiments on AI2-THOR and VirtualHome demonstrate that our approach achieves over 90% rejection of unsafe tasks while ensuring that safe-task rejection is low, outperforming existing methods in both safety and execution efficiency. Our work provides a scalable, model-agnostic solution for building trustworthy embodied agents.

MADRA: Multi-Agent Debate for Risk-Aware Embodied Planning

TL;DR

MADRA tackles safety in embodied task planning by replacing costly training-based risk methods with a training-free, multi-agent debate framework guided by a Critical Evaluator. The system integrates memory-augmented hierarchical planning and a self-evolution loop to continuously improve task success while maintaining strong safety—achieving over 90% unsafe-task rejection with low safe-task false positives across AI2-THOR and VirtualHome. A SafeAware-VH dataset of 800 household instructions supports safety evaluation, and experiments demonstrate robust generalization across diverse LLMs and environments. The work provides a scalable, model-agnostic module for trustworthy embodied agents, with practical implications for safer real-world deployment and future multimodal extensions.

Abstract

Ensuring the safety of embodied AI agents during task planning is critical for real-world deployment, especially in household environments where dangerous instructions pose significant risks. Existing methods often suffer from either high computational costs due to preference alignment training or over-rejection when using single-agent safety prompts. To address these limitations, we propose MADRA, a training-free Multi-Agent Debate Risk Assessment framework that leverages collective reasoning to enhance safety awareness without sacrificing task performance. MADRA employs multiple LLM-based agents to debate the safety of a given instruction, guided by a critical evaluator that scores responses based on logical soundness, risk identification, evidence quality, and clarity. Through iterative deliberation and consensus voting, MADRA significantly reduces false rejections while maintaining high sensitivity to dangerous tasks. Additionally, we introduce a hierarchical cognitive collaborative planning framework that integrates safety, memory, planning, and self-evolution mechanisms to improve task success rates through continuous learning. We also contribute SafeAware-VH, a benchmark dataset for safety-aware task planning in VirtualHome, containing 800 annotated instructions. Extensive experiments on AI2-THOR and VirtualHome demonstrate that our approach achieves over 90% rejection of unsafe tasks while ensuring that safe-task rejection is low, outperforming existing methods in both safety and execution efficiency. Our work provides a scalable, model-agnostic solution for building trustworthy embodied agents.

Paper Structure

This paper contains 34 sections, 8 equations, 16 figures, 5 tables, 2 algorithms.

Figures (16)

  • Figure 1: The framework of MADRA(Multi-Agent Debate Risk Assessment).
  • Figure 2: Overview of hierarchical cognitive collaborative planning framework. The framework incorporates four modules: Risk assessment as Figure \ref{['fig:example2']},Memory Enhancement(left),Hierarchical planning system(middle),Self-evolution mechanism(right).
  • Figure 3: Risk types of unsafe task instructions in SafeAware-VH.
  • Figure 4: The results of the ablation experiment of the risk assessment mechanism.
  • Figure 5: The rejection rate of different embodied agent methods on unsafe and safe tasks.
  • ...and 11 more figures