Agent Planning with World Knowledge Model

Shuofei Qiao; Runnan Fang; Ningyu Zhang; Yuqi Zhu; Xiang Chen; Shumin Deng; Yong Jiang; Pengjun Xie; Fei Huang; Huajun Chen

Agent Planning with World Knowledge Model

Shuofei Qiao, Runnan Fang, Ningyu Zhang, Yuqi Zhu, Xiang Chen, Shumin Deng, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen

TL;DR

This paper addresses the brittleness of open-source LLM agents in real-world planning by introducing a parametric World Knowledge Model (WKM) that provides global task knowledge and local state knowledge. The approach trains the agent and WKM via task-knowledge synthesis and state-knowledge summarization, with a retrieval-based constraint mechanism to mitigate hallucinations. Empirical results across ALFWorld, WebShop, and ScienceWorld on three open models show substantial gains over baselines, with strong generalization to unseen tasks and evidence that instance-level knowledge generalizes better than hand-crafted dataset knowledge. The work also demonstrates benefits of weak-guidance, multi-task training, and cautions against explicit state knowledge prompts, highlighting WKM's potential for scalable, robust agent planning.

Abstract

Recent endeavors towards directly using large language models (LLMs) as agent models to execute interactive planning tasks have shown commendable results. Despite their achievements, however, they still struggle with brainless trial-and-error in global planning and generating hallucinatory actions in local planning due to their poor understanding of the ``real'' physical world. Imitating humans' mental world knowledge model which provides global prior knowledge before the task and maintains local dynamic knowledge during the task, in this paper, we introduce parametric World Knowledge Model (WKM) to facilitate agent planning. Concretely, we steer the agent model to self-synthesize knowledge from both expert and sampled trajectories. Then we develop WKM, providing prior task knowledge to guide the global planning and dynamic state knowledge to assist the local planning. Experimental results on three complex real-world simulated datasets with three state-of-the-art open-source LLMs, Mistral-7B, Gemma-7B, and Llama-3-8B, demonstrate that our method can achieve superior performance compared to various strong baselines. Besides, we analyze to illustrate that our WKM can effectively alleviate the blind trial-and-error and hallucinatory action issues, providing strong support for the agent's understanding of the world. Other interesting findings include: 1) our instance-level task knowledge can generalize better to unseen tasks, 2) weak WKM can guide strong agent model planning, and 3) unified WKM training has promising potential for further development. The code is available at https://github.com/zjunlp/WKM.

Agent Planning with World Knowledge Model

TL;DR

Abstract

Paper Structure (46 sections, 12 equations, 13 figures, 7 tables)

This paper contains 46 sections, 12 equations, 13 figures, 7 tables.

Introduction
Preliminaries
World Knowledge Model.
Method
Task Knowledge Synthesis
Experienced Agent Exploration.
Self Knowledge Synthesis.
State Knowledge Summarization
State Knowledge Base Construction.
Model Training
Agent Model Training.
World Knowledge Model Training.
Agent Planning with World Knowledge Model
Experiments
Experimental Settings
...and 31 more sections

Figures (13)

Figure 1: Traditional agent planning vs. Agent planning with world knowledge model.
Figure 2: Overview of our WKM. We train a world knowledge model on the knowledge synthesized by the agent model itself from both expert and explored trajectories, providing prior task knowledge to guide global planning and dynamic state knowledge to assist local planning.
Figure 3: Ablation Study on Mistral-7B. w/o all means the vanilla experienced agent model training with pure expert trajectories. w/ state is testing agent model with only state knowledge base constraints. w/ task stands for guiding agent model with only task knowledge. w/ task&state is our WKM with both task knowledge guidance and state knowledge constraints. w/o rejected means synthesizing task knowledge solely through expert trajectories. merge stands for training WKM and the agent model together with one single model. prompt means using few-shot prompts to replace the WKM for providing knowledge.
Figure 4: Performance of human-designed dataset-level knowledge compared to WKM generated instance-level knowledge.
Figure 5: Relative performance of multi-task WKM compared to various baselines.
...and 8 more figures

Agent Planning with World Knowledge Model

TL;DR

Abstract

Agent Planning with World Knowledge Model

Authors

TL;DR

Abstract

Table of Contents

Figures (13)