LLM-SAP: Large Language Models Situational Awareness Based Planning

Liman Wang; Hanyang Zhong

LLM-SAP: Large Language Models Situational Awareness Based Planning

Liman Wang, Hanyang Zhong

TL;DR

This work addresses open-world hazard planning for AI agents by modeling hazard scenarios as $s\in S$ with input $x=\{c_1,...,c_k\}\subseteq C$ and producing a plan $\pi:S\to A$. It introduces a cooperative, multi-agent SAP framework where a generator and evaluator iteratively generate and critique plans within a state-based planning formalism $M=(S,T,A)$, refining toward an optimal plan $M^*$. A new hazard planning benchmark (24 vignettes across four complexity levels) plus a 56-action domestic robot action set and seven evaluation dimensions underpin rigorous assessment using rank-based scoring $R_k$. Results show that SAP prompts and multi-agent collaboration improve latent reasoning and planning quality, with commercial LLMs (e.g., GPT-4, GPT-3.5, Claude-2) outperforming open-source models and LLM evaluators providing reliable pairwise judgments. These findings push toward safer, more reliable AI planning in real-world, human-centric environments and point to future work on end-to-end vision-language integration and scalable reasoning architectures.

Abstract

This study explores integrating large language models (LLMs) with situational awareness-based planning (SAP) to enhance the decision-making capabilities of AI agents in dynamic and uncertain environments. We employ a multi-agent reasoning framework to develop a methodology that anticipates and actively mitigates potential risks through iterative feedback and evaluation processes. Our approach diverges from traditional automata theory by incorporating the complexity of human-centric interactions into the planning process, thereby expanding the planning scope of LLMs beyond structured and predictable scenarios. The results demonstrate significant improvements in the model's ability to provide comparative safe actions within hazard interactions, offering a perspective on proactive and reactive planning strategies. This research highlights the potential of LLMs to perform human-like action planning, thereby paving the way for more sophisticated, reliable, and safe AI systems in unpredictable real-world applications.

LLM-SAP: Large Language Models Situational Awareness Based Planning

TL;DR

This work addresses open-world hazard planning for AI agents by modeling hazard scenarios as

with input

and producing a plan

. It introduces a cooperative, multi-agent SAP framework where a generator and evaluator iteratively generate and critique plans within a state-based planning formalism

, refining toward an optimal plan

. A new hazard planning benchmark (24 vignettes across four complexity levels) plus a 56-action domestic robot action set and seven evaluation dimensions underpin rigorous assessment using rank-based scoring

. Results show that SAP prompts and multi-agent collaboration improve latent reasoning and planning quality, with commercial LLMs (e.g., GPT-4, GPT-3.5, Claude-2) outperforming open-source models and LLM evaluators providing reliable pairwise judgments. These findings push toward safer, more reliable AI planning in real-world, human-centric environments and point to future work on end-to-end vision-language integration and scalable reasoning architectures.

Abstract

Paper Structure (29 sections, 2 equations, 20 figures, 11 tables, 1 algorithm)

This paper contains 29 sections, 2 equations, 20 figures, 11 tables, 1 algorithm.

Introduction
Methodology
Task Formulation and Key Challenges
Multi-AI Agents Enhance Reasoning and Accuracy
State-based Planning with Feedback
Formation of Prompts
Experiments
Evaluation Scenarios
Actions Set
Evaluation Dimensions
Evaluation Metrics
Results
LLM Selection
Impact of The SAP Prompt
LLM Evaluators
...and 14 more sections

Figures (20)

Figure 1: Large language models' planning enhancements based on situational awareness.
Figure 2: Iteratively generating and evaluating plans in a multi-agent proactive AI system produces iterative feedback to enhance reasoning and factual accuracy. † refers to selecting the human demo for comparison only in the first round of iteration.
Figure 3: Actions set. The actions are divided into 6 categories, covering the common actions of housekeeper robots.
Figure 4: Actions and the objects involved in actions are included in the 24 home hazard scenarios.
Figure 5: Models rankings distribution on the 24 scenarios. Rankings are rounded.
...and 15 more figures

LLM-SAP: Large Language Models Situational Awareness Based Planning

TL;DR

Abstract

LLM-SAP: Large Language Models Situational Awareness Based Planning

Authors

TL;DR

Abstract

Table of Contents

Figures (20)