Table of Contents
Fetching ...

DR. WELL: Dynamic Reasoning and Learning with Symbolic World Model for Embodied LLM-Based Multi-Agent Collaboration

Narjes Nourzad, Hanqing Yang, Shiyu Chen, Carlee Joe-Wong

TL;DR

DR. WELL addresses cooperative multi-agent planning under partial information and limited communication by coupling embodied LLM agents with a two-phase negotiation protocol and a dynamic symbolic world memory. The framework decentralizes task allocation and planning through proposals and commitments, grounded in a shared symbolic graph that records past experiences, prototypes, and outcomes. Empirical results in cooperative push-block tasks show DR. WELL improves task completion rates and efficiency, while avoiding brittle trajectory-level alignment through symbolic abstraction and negotiation-aware planning. The approach yields interpretable, reusable coordination patterns and scalable performance as team size grows.

Abstract

Cooperative multi-agent planning requires agents to make joint decisions with partial information and limited communication. Coordination at the trajectory level often fails, as small deviations in timing or movement cascade into conflicts. Symbolic planning mitigates this challenge by raising the level of abstraction and providing a minimal vocabulary of actions that enable synchronization and collective progress. We present DR. WELL, a decentralized neurosymbolic framework for cooperative multi-agent planning. Cooperation unfolds through a two-phase negotiation protocol: agents first propose candidate roles with reasoning and then commit to a joint allocation under consensus and environment constraints. After commitment, each agent independently generates and executes a symbolic plan for its role without revealing detailed trajectories. Plans are grounded in execution outcomes via a shared world model that encodes the current state and is updated as agents act. By reasoning over symbolic plans rather than raw trajectories, DR. WELL avoids brittle step-level alignment and enables higher-level operations that are reusable, synchronizable, and interpretable. Experiments on cooperative block-push tasks show that agents adapt across episodes, with the dynamic world model capturing reusable patterns and improving task completion rates and efficiency. Experiments on cooperative block-push tasks show that our dynamic world model improves task completion and efficiency through negotiation and self-refinement, trading a time overhead for evolving, more efficient collaboration strategies.

DR. WELL: Dynamic Reasoning and Learning with Symbolic World Model for Embodied LLM-Based Multi-Agent Collaboration

TL;DR

DR. WELL addresses cooperative multi-agent planning under partial information and limited communication by coupling embodied LLM agents with a two-phase negotiation protocol and a dynamic symbolic world memory. The framework decentralizes task allocation and planning through proposals and commitments, grounded in a shared symbolic graph that records past experiences, prototypes, and outcomes. Empirical results in cooperative push-block tasks show DR. WELL improves task completion rates and efficiency, while avoiding brittle trajectory-level alignment through symbolic abstraction and negotiation-aware planning. The approach yields interpretable, reusable coordination patterns and scalable performance as team size grows.

Abstract

Cooperative multi-agent planning requires agents to make joint decisions with partial information and limited communication. Coordination at the trajectory level often fails, as small deviations in timing or movement cascade into conflicts. Symbolic planning mitigates this challenge by raising the level of abstraction and providing a minimal vocabulary of actions that enable synchronization and collective progress. We present DR. WELL, a decentralized neurosymbolic framework for cooperative multi-agent planning. Cooperation unfolds through a two-phase negotiation protocol: agents first propose candidate roles with reasoning and then commit to a joint allocation under consensus and environment constraints. After commitment, each agent independently generates and executes a symbolic plan for its role without revealing detailed trajectories. Plans are grounded in execution outcomes via a shared world model that encodes the current state and is updated as agents act. By reasoning over symbolic plans rather than raw trajectories, DR. WELL avoids brittle step-level alignment and enables higher-level operations that are reusable, synchronizable, and interpretable. Experiments on cooperative block-push tasks show that agents adapt across episodes, with the dynamic world model capturing reusable patterns and improving task completion rates and efficiency. Experiments on cooperative block-push tasks show that our dynamic world model improves task completion and efficiency through negotiation and self-refinement, trading a time overhead for evolving, more efficient collaboration strategies.

Paper Structure

This paper contains 24 sections, 10 equations, 12 figures.

Figures (12)

  • Figure 1: Workflow of DR. WELL framework. In Step 1, an agent enters the communication room with other idle agents to begin negotiation and exits with a single commitment. For instance, in a block pushing environment, the commitment may be represented by a block ID, which the agent then tries to move to the goal zone. It then reasons and generates a plan to accomplish the committed task. In Step 2, the agent refines the plan using the world model, which pushes the plan toward a more effective form. Once revised, the controller validates plans. Interaction with the environment occurs by decomposing symbolic actions into their primitive form, while the environment simultaneously updates the agent’s understanding as other agents execute their own plans.
  • Figure 2: Two-stage Negotiation protocol: In the proposal stage, agents suggest candidate tasks with reasoning about feasibility and coordination needs. In the commit stage, they converge on a joint decision and specialize roles before independently planning over the shared world model.
  • Figure 3: Post-commitment planning and execution cycle in a block-pushing environment. Each color represents a different agent, showing its planning and execution timeline. The speech bubble marks the communication room where agents synchronize and negotiate before proceeding with execution. After consensus, each agent expands its assigned role into a full symbolic plan and executes it independently. These plans are shown on the left as sequences of white rectangles, each representing a symbolic action (e.g., Push), whose varying widths indicate different durations. Once a plan is completed, the agent re-enters the communication room, the environment then pauses until all idle agents resynchronize, briefly suspending those still mid-plan. On the right, snapshots at different timesteps ($t=5,10,20$) show how agents coordinate to approach and push blocks toward the goal zone, where completed blocks are marked DONE and excluded from future negotiations.
  • Figure 4: View of the world model (WM), showing how it organizes knowledge across multiple layers: task and communication history, plan prototypes, and detailed plan instances. These layers acts as a shared memory that aggregates state and experience, ensuring agents draw on the same knowledge base while keeping execution decentralized and private.
  • Figure 5: Two-agent game with a limit of 150 max_steps. Communication events are marked along the top of the timeline. For each agent, timelines record task allocations, outcomes, and the sequence of symbolic actions (e.g., MoveToBlock, Push, Rendezvous).
  • ...and 7 more figures