Table of Contents
Fetching ...

ARGOS: Automated Functional Safety Requirement Synthesis for Embodied AI via Attribute-Guided Combinatorial Reasoning

Dongsheng Chen, Yuxuan Li, Yi Lin, Guanhua Chen, Jiaxin Zhang, Xiangyu Zhao, Lei Ma, Xin Yao, Xuetao Wei

TL;DR

This work tackles the scalability and grounding challenges of functional safety in Embodied AI by introducing ARGOS, a two-stage framework that connects open-ended user instructions to physical risk attributes and regulatory-aligned safety requirements. Stage I grounds semantic entities into fine-grained attributes and uses combinatorial reasoning to discover long-tail hazards, while Stage II constrains hazard reasoning with ISO-like standards and hardware capabilities to synthesize testable FSRs. Extensive experiments demonstrate that ARGOS outperforms baselines in hazard discovery quality, long-tail risk coverage, and FSR generation, with robust results across backbones and favorable human-algorithm alignment. By shifting from semantic label-to-label mappings to attribute-based deduction and constrained synthesis, ARGOS offers a scalable, physically grounded path toward safe industrial deployment of Embodied AI.

Abstract

Ensuring functional safety is essential for the deployment of Embodied AI in complex open-world environments. However, traditional Hazard Analysis and Risk Assessment (HARA) methods struggle to scale in this domain. While HARA relies on enumerating risks for finite and pre-defined function lists, Embodied AI operates on open-ended natural language instructions, creating a challenge of combinatorial interaction risks. Whereas Large Language Models (LLMs) have emerged as a promising solution to this scalability challenge, they often lack physical grounding, yielding semantically superficial and incoherent hazard descriptions. To overcome these limitations, we propose a new framework ARGOS (AttRibute-Guided cOmbinatorial reaSoning), which bridges the gap between open-ended user instructions and concrete physical attributes. By dynamically decomposing entities from instructions into these fine-grained properties, ARGOS grounds LLM reasoning in causal risk factors to generate physically plausible hazard scenarios. It then instantiates abstract safety standards, such as ISO 13482, into context-specific Functional Safety Requirements (FSRs) by integrating these scenarios with robot capabilities. Extensive experiments validate that ARGOS produces high-quality FSRs and outperforms baselines in identifying long-tail risks. Overall, this work paves the way for systematic and grounded functional safety requirement generation, a critical step toward the safe industrial deployment of Embodied AI.

ARGOS: Automated Functional Safety Requirement Synthesis for Embodied AI via Attribute-Guided Combinatorial Reasoning

TL;DR

This work tackles the scalability and grounding challenges of functional safety in Embodied AI by introducing ARGOS, a two-stage framework that connects open-ended user instructions to physical risk attributes and regulatory-aligned safety requirements. Stage I grounds semantic entities into fine-grained attributes and uses combinatorial reasoning to discover long-tail hazards, while Stage II constrains hazard reasoning with ISO-like standards and hardware capabilities to synthesize testable FSRs. Extensive experiments demonstrate that ARGOS outperforms baselines in hazard discovery quality, long-tail risk coverage, and FSR generation, with robust results across backbones and favorable human-algorithm alignment. By shifting from semantic label-to-label mappings to attribute-based deduction and constrained synthesis, ARGOS offers a scalable, physically grounded path toward safe industrial deployment of Embodied AI.

Abstract

Ensuring functional safety is essential for the deployment of Embodied AI in complex open-world environments. However, traditional Hazard Analysis and Risk Assessment (HARA) methods struggle to scale in this domain. While HARA relies on enumerating risks for finite and pre-defined function lists, Embodied AI operates on open-ended natural language instructions, creating a challenge of combinatorial interaction risks. Whereas Large Language Models (LLMs) have emerged as a promising solution to this scalability challenge, they often lack physical grounding, yielding semantically superficial and incoherent hazard descriptions. To overcome these limitations, we propose a new framework ARGOS (AttRibute-Guided cOmbinatorial reaSoning), which bridges the gap between open-ended user instructions and concrete physical attributes. By dynamically decomposing entities from instructions into these fine-grained properties, ARGOS grounds LLM reasoning in causal risk factors to generate physically plausible hazard scenarios. It then instantiates abstract safety standards, such as ISO 13482, into context-specific Functional Safety Requirements (FSRs) by integrating these scenarios with robot capabilities. Extensive experiments validate that ARGOS produces high-quality FSRs and outperforms baselines in identifying long-tail risks. Overall, this work paves the way for systematic and grounded functional safety requirement generation, a critical step toward the safe industrial deployment of Embodied AI.
Paper Structure (38 sections, 4 equations, 4 figures, 5 tables)

This paper contains 38 sections, 4 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: The proposed ARGOS framework: A two-stage pipeline for automated FSR synthesis. Stage I focuses on decomposing semantic entities into physical attributes for combinatorial hazard discovery, while Stage II aligns these hazards with regulatory standards and hardware constraints to generate requirements.
  • Figure 2: Statistical analysis of generation quality. The violin plot illustrates the score density across methods.
  • Figure 3: Qualitative Analysis: Semantic Diversity and Evaluation Alignment.
  • Figure 4: Visualizations for the GPT-4o Backbone. (a) The violin plot confirms that our method maintains a "top-heavy" high-quality distribution even on the stronger GPT-4o model, whereas baselines still exhibit long-tail risks. (b) The t-SNE projection shows that our method (red) covers a distinct and broader semantic space compared to the baselines, consistent with the findings on DeepSeek-V3.2.