Table of Contents
Fetching ...

COMPASS: Cooperative Multi-Agent Persistent Monitoring using Spatio-Temporal Attention Network

Xingjian Zhang, Yizhuo Wang, Guillaume Sartoretti

TL;DR

COMPASS introduces a decentralized multi-agent framework for persistent monitoring that combines Gaussian Process-based belief estimation with a spatio-temporal attention network operating on a graph-based environment. Each agent locally updates beliefs, reasons over history and spatial relations through a sharedTransformer backbone, and coordinates via compact belief exchanges without a central planner. Centralized training with PPO yields policies that minimize global uncertainty while balancing exploration, redundancy, and movement cost, achieving superior uncertainty reduction and visitation balance than strong baselines. The approach is validated in high-fidelity simulations and 3D AirSim deployment, highlighting scalable, uncertainty-aware coordination suitable for real-world persistent surveillance tasks.

Abstract

Persistent monitoring of dynamic targets is essential in real-world applications such as disaster response, environmental sensing, and wildlife conservation, where mobile agents must continuously gather information under uncertainty. We propose COMPASS, a multi-agent reinforcement learning (MARL) framework that enables decentralized agents to persistently monitor multiple moving targets efficiently. We model the environment as a graph, where nodes represent spatial locations and edges capture topological proximity, allowing agents to reason over structured layouts and revisit informative regions as needed. Each agent independently selects actions based on a shared spatio-temporal attention network that we design to integrate historical observations and spatial context. We model target dynamics using Gaussian Processes (GPs), which support principled belief updates and enable uncertainty-aware planning. We train COMPASS using centralized value estimation and decentralized policy execution under an adaptive reward setting. Our extensive experiments demonstrate that COMPASS consistently outperforms strong baselines in uncertainty reduction, target coverage, and coordination efficiency across dynamic multi-target scenarios.

COMPASS: Cooperative Multi-Agent Persistent Monitoring using Spatio-Temporal Attention Network

TL;DR

COMPASS introduces a decentralized multi-agent framework for persistent monitoring that combines Gaussian Process-based belief estimation with a spatio-temporal attention network operating on a graph-based environment. Each agent locally updates beliefs, reasons over history and spatial relations through a sharedTransformer backbone, and coordinates via compact belief exchanges without a central planner. Centralized training with PPO yields policies that minimize global uncertainty while balancing exploration, redundancy, and movement cost, achieving superior uncertainty reduction and visitation balance than strong baselines. The approach is validated in high-fidelity simulations and 3D AirSim deployment, highlighting scalable, uncertainty-aware coordination suitable for real-world persistent surveillance tasks.

Abstract

Persistent monitoring of dynamic targets is essential in real-world applications such as disaster response, environmental sensing, and wildlife conservation, where mobile agents must continuously gather information under uncertainty. We propose COMPASS, a multi-agent reinforcement learning (MARL) framework that enables decentralized agents to persistently monitor multiple moving targets efficiently. We model the environment as a graph, where nodes represent spatial locations and edges capture topological proximity, allowing agents to reason over structured layouts and revisit informative regions as needed. Each agent independently selects actions based on a shared spatio-temporal attention network that we design to integrate historical observations and spatial context. We model target dynamics using Gaussian Processes (GPs), which support principled belief updates and enable uncertainty-aware planning. We train COMPASS using centralized value estimation and decentralized policy execution under an adaptive reward setting. Our extensive experiments demonstrate that COMPASS consistently outperforms strong baselines in uncertainty reduction, target coverage, and coordination efficiency across dynamic multi-target scenarios.

Paper Structure

This paper contains 23 sections, 6 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Illustration of the COMPASS framework in a simulated multi-agent wildlife monitoring task.(Left) Visualization of three UAV agents' trajectories and corresponding belief maps over dynamic targets; brighter areas denote higher uncertainty regions that agents are incentivized to explore. (Right) AirSim-based 3D deployment, where three UAVs cooperatively track multiple moving animal targets modeled with distinct 3D meshes.
  • Figure 2: Overview of the COMPASS spatio-temporal attention network. Agent observations are first integrated by Gaussian Processes (GPs) to update belief maps. Encoded historical node features are processed by a Temporal Encoder and a Spatial Encoder with positional and masking mechanisms. The fused representation is used by shared Actor and Critic heads for decentralized policy execution, enabling cooperative uncertainty-aware decision-making among agents.
  • Figure 3: Average uncertainty over mission time. COMPASS achieves the fastest and most stable uncertainty reduction compared to baseline methods ($K=200, M=3, N=8$). Solid lines show means; shaded areas denote standard deviations over 20 runs.