LLaMAR: Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments

Siddharth Nayak; Adelmo Morrison Orozco; Marina Ten Have; Vittal Thirumalai; Jackson Zhang; Darren Chen; Aditya Kapoor; Eric Robinson; Karthik Gopalakrishnan; James Harrison; Brian Ichter; Anuj Mahajan; Hamsa Balakrishnan

LLaMAR: Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments

Siddharth Nayak, Adelmo Morrison Orozco, Marina Ten Have, Vittal Thirumalai, Jackson Zhang, Darren Chen, Aditya Kapoor, Eric Robinson, Karthik Gopalakrishnan, James Harrison, Brian Ichter, Anuj Mahajan, Hamsa Balakrishnan

TL;DR

LLaMAR presents a centralized, LM-driven plan-act-correct-verify architecture for long-horizon, multi-agent robotics in partially observable environments. By decoupling reasoning across Planner, Actor, Corrector, and Verifier modules and enabling real-time correction from action feedback, it forgoes oracle simulators while leveraging a semantic exploration strategy and Sentence-BERT action mapping. MAP-THOR and SAR benchmarks demonstrate substantial performance gains (≈30% higher success rates) over state-of-the-art LM-based planners, with ablations confirming the contribution of each module. This work advances practical multi-agent planning with LMs, reducing reliance on privileged information and moving closer to robust real-world deployment.

Abstract

The ability of Language Models (LMs) to understand natural language makes them a powerful tool for parsing human instructions into task plans for autonomous robots. Unlike traditional planning methods that rely on domain-specific knowledge and handcrafted rules, LMs generalize from diverse data and adapt to various tasks with minimal tuning, acting as a compressed knowledge base. However, LMs in their standard form face challenges with long-horizon tasks, particularly in partially observable multi-agent settings. We propose an LM-based Long-Horizon Planner for Multi-Agent Robotics (LLaMAR), a cognitive architecture for planning that achieves state-of-the-art results in long-horizon tasks within partially observable environments. LLaMAR employs a plan-act-correct-verify framework, allowing self-correction from action execution feedback without relying on oracles or simulators. Additionally, we present MAP-THOR, a comprehensive test suite encompassing household tasks of varying complexity within the AI2-THOR environment. Experiments show that LLaMAR achieves a 30% higher success rate than other state-of-the-art LM-based multi-agent planners in MAP-THOR and Search \& Rescue tasks. Code can be found at https://github.com/nsidn98/LLaMAR

LLaMAR: Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments

TL;DR

Abstract

Paper Structure (32 sections, 6 figures, 8 tables, 3 algorithms)

This paper contains 32 sections, 6 figures, 8 tables, 3 algorithms.

Introduction
Related Work
Background
Approach
Experiments
Results and Discussion
Limitations and Future Work
Conclusion
Terminology
MAP-THOR Environment
Observation Space
Action Space
MAP-THOR Task Types
Search & Rescue Environment (SAR)
Multi-agent Nature
...and 17 more sections

Figures (6)

Figure 1: An overview of LLaMAR's modular cognitive architecture. LLaMAR leverages LMs within four key modules: Planner, Actor, Corrector, and Verifier, each with specific roles. The Planner breaks down the high-level language instruction into feasible subtasks to achieve the environment goal. The Actor determines the high-level actions each agent should perform. These actions trigger low-level policies that generate and execute a sequence of primitive actions in sync across all agents. Based on execution feedback, the Corrector suggests corrections for high-level actions and the Verifier Module validates completion of subtasks.
Figure 2: A few examples of the Corrector module mitigate failures in predicted actions by the Actor module. (a) the Corrector suggests getting closer to the agent before attempting to pick it up, (b) the Corrector recommends opening the fridge because the previous action of placing the plate failed, (c) the Corrector advises rotating right so that it can access the table to place the tissue box on it when the low-level navigation policy failed to find a path to the table
Figure 3: Photorealistic rendering of household scenarios in the AI2Thor simulator enables the usage of multiple autonomous robots to carry out daily tasks.
Figure 4: Choice of direction for the exploration heuristic: The agent (Alice) rotates towards 4 cardinal directions to get observations. The cosine similarity between the CLIP embeddings $I_d$ for these 4 images are calculated with the CLIP embeddings for each subtask in the open subtasks set $\mathcal{G}_O$ to get the exploration score $\mathcal{E}_d$ for each direction. The direction with the highest $\mathcal{E}_d$ is chosen to explore and the agent moves $J=2$ steps in that direction.
Figure 5: The search & rescue environment consists of multiple drones in an unknown environment with missing people, fires of different types, and water and sand reservoirs.
...and 1 more figures

LLaMAR: Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments

TL;DR

Abstract

LLaMAR: Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments

Authors

TL;DR

Abstract

Table of Contents

Figures (6)