Table of Contents
Fetching ...

SPOC: Safety-Aware Planning Under Partial Observability And Physical Constraints

Hyungmin Kim, Hobeom Jeon, Dohyung Kim, Minsu Jang, Jeahong Kim

TL;DR

SPOC is introduced, a benchmark for safety-aware embodied task planning, which integrates strict partial observability, physical constraints, step-by-step planning, and goal-condition-based evaluation.

Abstract

Embodied Task Planning with large language models faces safety challenges in real-world environments, where partial observability and physical constraints must be respected. Existing benchmarks often overlook these critical factors, limiting their ability to evaluate both feasibility and safety. We introduce SPOC, a benchmark for safety-aware embodied task planning, which integrates strict partial observability, physical constraints, step-by-step planning, and goal-condition-based evaluation. Covering diverse household hazards such as fire, fluid, injury, object damage, and pollution, SPOC enables rigorous assessment through both state and constraint-based online metrics. Experiments with state-of-the-art LLMs reveal that current models struggle to ensure safety-aware planning, particularly under implicit constraints. Code and dataset are available at https://github.com/khm159/SPOC

SPOC: Safety-Aware Planning Under Partial Observability And Physical Constraints

TL;DR

SPOC is introduced, a benchmark for safety-aware embodied task planning, which integrates strict partial observability, physical constraints, step-by-step planning, and goal-condition-based evaluation.

Abstract

Embodied Task Planning with large language models faces safety challenges in real-world environments, where partial observability and physical constraints must be respected. Existing benchmarks often overlook these critical factors, limiting their ability to evaluate both feasibility and safety. We introduce SPOC, a benchmark for safety-aware embodied task planning, which integrates strict partial observability, physical constraints, step-by-step planning, and goal-condition-based evaluation. Covering diverse household hazards such as fire, fluid, injury, object damage, and pollution, SPOC enables rigorous assessment through both state and constraint-based online metrics. Experiments with state-of-the-art LLMs reveal that current models struggle to ensure safety-aware planning, particularly under implicit constraints. Code and dataset are available at https://github.com/khm159/SPOC
Paper Structure (11 sections, 1 figure, 4 tables)

This paper contains 11 sections, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Violations of PO (shown in (a) of the Figure) and PC (PC, shown in (b) of the Figure) can greatly reduce the feasibility. The existing accessible safety-aware ETP benchmarks yin2024safeagentbenchzhu2024earbenchhuang2025framework provide only limited support for PO and PC. In contrast, SPOC enables comprehensive handling of both PO and PC through low-level navigation and interaction actions.