AID: Agent Intent from Diffusion for Multi-Agent Informative Path Planning

Jeric Lew; Yuhong Cao; Derek Ming Siang Tan; Guillaume Sartoretti

AID: Agent Intent from Diffusion for Multi-Agent Informative Path Planning

Jeric Lew, Yuhong Cao, Derek Ming Siang Tan, Guillaume Sartoretti

TL;DR

<3-5 sentence high-level summary> MAIPP requires coordinated information gathering under strict budgets, with belief updates from measurements making planning challenging. The paper introduces AID, a decentralized diffusion-based framework that first behavior-clones from existing planners and then online-fines via DPPO to yield long-horizon, non-autoregressive intents. It demonstrates up to 4x faster planning and up to 17% more information gain over strong baselines while staying scalable to more agents, using a PRM-augmented, GP-informed observation pipeline and a graph-attention encoder. These results highlight diffusion policies as a powerful tool for robust, scalable, multi-agent coordination in informative path planning, with potential extensions to 3D environments and real-robot experiments.

Abstract

Information gathering in large-scale or time-critical scenarios (e.g., environmental monitoring, search and rescue) requires broad coverage within limited time budgets, motivating the use of multi-agent systems. These scenarios are commonly formulated as multi-agent informative path planning (MAIPP), where multiple agents must coordinate to maximize information gain while operating under budget constraints. A central challenge in MAIPP is ensuring effective coordination while the belief over the environment evolves with incoming measurements. Recent learning-based approaches address this by using distributions over future positions as "intent" to support coordination. However, these autoregressive intent predictors are computationally expensive and prone to compounding errors. Inspired by the effectiveness of diffusion models as expressive, long-horizon policies, we propose AID, a fully decentralized MAIPP framework that leverages diffusion models to generate long-term trajectories in a non-autoregressive manner. AID first performs behavior cloning on trajectories produced by existing MAIPP planners and then fine-tunes the policy using reinforcement learning via Diffusion Policy Policy Optimization (DPPO). This two-stage pipeline enables the policy to inherit expert behavior while learning improved coordination through online reward feedback. Experiments demonstrate that AID consistently improves upon the MAIPP planners it is trained from, achieving up to 4x faster execution and 17% increased information gain, while scaling effectively to larger numbers of agents. Our implementation is publicly available at https://github.com/marmotlab/AID.

AID: Agent Intent from Diffusion for Multi-Agent Informative Path Planning

TL;DR

Abstract

AID: Agent Intent from Diffusion for Multi-Agent Informative Path Planning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)