Table of Contents
Fetching ...

AID: Agent Intent from Diffusion for Multi-Agent Informative Path Planning

Jeric Lew, Yuhong Cao, Derek Ming Siang Tan, Guillaume Sartoretti

TL;DR

<3-5 sentence high-level summary> MAIPP requires coordinated information gathering under strict budgets, with belief updates from measurements making planning challenging. The paper introduces AID, a decentralized diffusion-based framework that first behavior-clones from existing planners and then online-fines via DPPO to yield long-horizon, non-autoregressive intents. It demonstrates up to 4x faster planning and up to 17% more information gain over strong baselines while staying scalable to more agents, using a PRM-augmented, GP-informed observation pipeline and a graph-attention encoder. These results highlight diffusion policies as a powerful tool for robust, scalable, multi-agent coordination in informative path planning, with potential extensions to 3D environments and real-robot experiments.

Abstract

Information gathering in large-scale or time-critical scenarios (e.g., environmental monitoring, search and rescue) requires broad coverage within limited time budgets, motivating the use of multi-agent systems. These scenarios are commonly formulated as multi-agent informative path planning (MAIPP), where multiple agents must coordinate to maximize information gain while operating under budget constraints. A central challenge in MAIPP is ensuring effective coordination while the belief over the environment evolves with incoming measurements. Recent learning-based approaches address this by using distributions over future positions as "intent" to support coordination. However, these autoregressive intent predictors are computationally expensive and prone to compounding errors. Inspired by the effectiveness of diffusion models as expressive, long-horizon policies, we propose AID, a fully decentralized MAIPP framework that leverages diffusion models to generate long-term trajectories in a non-autoregressive manner. AID first performs behavior cloning on trajectories produced by existing MAIPP planners and then fine-tunes the policy using reinforcement learning via Diffusion Policy Policy Optimization (DPPO). This two-stage pipeline enables the policy to inherit expert behavior while learning improved coordination through online reward feedback. Experiments demonstrate that AID consistently improves upon the MAIPP planners it is trained from, achieving up to 4x faster execution and 17% increased information gain, while scaling effectively to larger numbers of agents. Our implementation is publicly available at https://github.com/marmotlab/AID.

AID: Agent Intent from Diffusion for Multi-Agent Informative Path Planning

TL;DR

<3-5 sentence high-level summary> MAIPP requires coordinated information gathering under strict budgets, with belief updates from measurements making planning challenging. The paper introduces AID, a decentralized diffusion-based framework that first behavior-clones from existing planners and then online-fines via DPPO to yield long-horizon, non-autoregressive intents. It demonstrates up to 4x faster planning and up to 17% more information gain over strong baselines while staying scalable to more agents, using a PRM-augmented, GP-informed observation pipeline and a graph-attention encoder. These results highlight diffusion policies as a powerful tool for robust, scalable, multi-agent coordination in informative path planning, with potential extensions to 3D environments and real-robot experiments.

Abstract

Information gathering in large-scale or time-critical scenarios (e.g., environmental monitoring, search and rescue) requires broad coverage within limited time budgets, motivating the use of multi-agent systems. These scenarios are commonly formulated as multi-agent informative path planning (MAIPP), where multiple agents must coordinate to maximize information gain while operating under budget constraints. A central challenge in MAIPP is ensuring effective coordination while the belief over the environment evolves with incoming measurements. Recent learning-based approaches address this by using distributions over future positions as "intent" to support coordination. However, these autoregressive intent predictors are computationally expensive and prone to compounding errors. Inspired by the effectiveness of diffusion models as expressive, long-horizon policies, we propose AID, a fully decentralized MAIPP framework that leverages diffusion models to generate long-term trajectories in a non-autoregressive manner. AID first performs behavior cloning on trajectories produced by existing MAIPP planners and then fine-tunes the policy using reinforcement learning via Diffusion Policy Policy Optimization (DPPO). This two-stage pipeline enables the policy to inherit expert behavior while learning improved coordination through online reward feedback. Experiments demonstrate that AID consistently improves upon the MAIPP planners it is trained from, achieving up to 4x faster execution and 17% increased information gain, while scaling effectively to larger numbers of agents. Our implementation is publicly available at https://github.com/marmotlab/AID.

Paper Structure

This paper contains 27 sections, 5 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Example run of AID with 3 agents.(1) shows the agents' trajectories, where the translucent segment is the black agent's predicted future path. (1) and (4) depict the GP-predicted mean and standard deviation of the information distribution (Section \ref{['sec: gp']}), with brighter cells indicating higher values. (2) shows the ground-truth information distribution, and (5) highlights the current high-interest region (Section \ref{['sec: maipp_problem']}). (6) visualizes the black agent's intent distribution from its predicted future positions, while (3) shows the fused intent distribution of the other agents (Section \ref{['sec: intent']}).
  • Figure 2: Pipeline for AID.
  • Figure 3: Example of agent intent generated by diffusion model. 5 trajectory predictions were generated per agent with a planning horizon $T_p = 8$ (Section \ref{['sec: pretrain']}). For individual intents, the colored star denotes the agent’s current position, squares indicate predicted destinations, and the dots with connecting lines represent intermediate way-points and the full predicted paths. In the fused intent visualization, the colored star reflects the mean predicted position across the agent’s sampled trajectories.