LLM-SAP: Large Language Models Situational Awareness Based Planning
Liman Wang, Hanyang Zhong
TL;DR
This work addresses open-world hazard planning for AI agents by modeling hazard scenarios as $s\in S$ with input $x=\{c_1,...,c_k\}\subseteq C$ and producing a plan $\pi:S\to A$. It introduces a cooperative, multi-agent SAP framework where a generator and evaluator iteratively generate and critique plans within a state-based planning formalism $M=(S,T,A)$, refining toward an optimal plan $M^*$. A new hazard planning benchmark (24 vignettes across four complexity levels) plus a 56-action domestic robot action set and seven evaluation dimensions underpin rigorous assessment using rank-based scoring $R_k$. Results show that SAP prompts and multi-agent collaboration improve latent reasoning and planning quality, with commercial LLMs (e.g., GPT-4, GPT-3.5, Claude-2) outperforming open-source models and LLM evaluators providing reliable pairwise judgments. These findings push toward safer, more reliable AI planning in real-world, human-centric environments and point to future work on end-to-end vision-language integration and scalable reasoning architectures.
Abstract
This study explores integrating large language models (LLMs) with situational awareness-based planning (SAP) to enhance the decision-making capabilities of AI agents in dynamic and uncertain environments. We employ a multi-agent reasoning framework to develop a methodology that anticipates and actively mitigates potential risks through iterative feedback and evaluation processes. Our approach diverges from traditional automata theory by incorporating the complexity of human-centric interactions into the planning process, thereby expanding the planning scope of LLMs beyond structured and predictable scenarios. The results demonstrate significant improvements in the model's ability to provide comparative safe actions within hazard interactions, offering a perspective on proactive and reactive planning strategies. This research highlights the potential of LLMs to perform human-like action planning, thereby paving the way for more sophisticated, reliable, and safe AI systems in unpredictable real-world applications.
