Table of Contents
Fetching ...

Generalizable Collaborative Search-and-Capture in Cluttered Environments via Path-Guided MAPPO and Directional Frontier Allocation

Jialin Ying, Zhihao Li, Zicheng Dong, Guohua Wu, Yihuan Liao

TL;DR

The paper tackles the challenge of cooperative pursuit-evasion in cluttered, partially observable environments with sparse rewards. It introduces PGF-MAPPO, a hierarchical framework that fuses topological path guidance from A*- planning with a directional frontier allocation strategy and a parameter-shared decentralized critic, enabling scalable, real-time control. Key contributions include the Directional FPS with sector suppression, Hungarian assignment for efficient target allocation, and a dense A*-based potential reward that accelerates learning and exploration. Extensive experiments show strong zero-shot generalization from 10×10 training maps to larger 15×15 and 20×20 maps, outpacing rule-based and learning-based baselines in capture efficiency and robustness in cluttered settings.

Abstract

Collaborative pursuit-evasion in cluttered environments presents significant challenges due to sparse rewards and constrained Fields of View (FOV). Standard Multi-Agent Reinforcement Learning (MARL) often suffers from inefficient exploration and fails to scale to large scenarios. We propose PGF-MAPPO (Path-Guided Frontier MAPPO), a hierarchical framework bridging topological planning with reactive control. To resolve local minima and sparse rewards, we integrate an A*-based potential field for dense reward shaping. Furthermore, we introduce Directional Frontier Allocation, combining Farthest Point Sampling (FPS) with geometric angle suppression to enforce spatial dispersion and accelerate coverage. The architecture employs a parameter-shared decentralized critic, maintaining O(1) model complexity suitable for robotic swarms. Experiments demonstrate that PGF-MAPPO achieves superior capture efficiency against faster evaders. Policies trained on 10x10 maps exhibit robust zero-shot generalization to unseen 20x20 environments, significantly outperforming rule-based and learning-based baselines.

Generalizable Collaborative Search-and-Capture in Cluttered Environments via Path-Guided MAPPO and Directional Frontier Allocation

TL;DR

The paper tackles the challenge of cooperative pursuit-evasion in cluttered, partially observable environments with sparse rewards. It introduces PGF-MAPPO, a hierarchical framework that fuses topological path guidance from A*- planning with a directional frontier allocation strategy and a parameter-shared decentralized critic, enabling scalable, real-time control. Key contributions include the Directional FPS with sector suppression, Hungarian assignment for efficient target allocation, and a dense A*-based potential reward that accelerates learning and exploration. Extensive experiments show strong zero-shot generalization from 10×10 training maps to larger 15×15 and 20×20 maps, outpacing rule-based and learning-based baselines in capture efficiency and robustness in cluttered settings.

Abstract

Collaborative pursuit-evasion in cluttered environments presents significant challenges due to sparse rewards and constrained Fields of View (FOV). Standard Multi-Agent Reinforcement Learning (MARL) often suffers from inefficient exploration and fails to scale to large scenarios. We propose PGF-MAPPO (Path-Guided Frontier MAPPO), a hierarchical framework bridging topological planning with reactive control. To resolve local minima and sparse rewards, we integrate an A*-based potential field for dense reward shaping. Furthermore, we introduce Directional Frontier Allocation, combining Farthest Point Sampling (FPS) with geometric angle suppression to enforce spatial dispersion and accelerate coverage. The architecture employs a parameter-shared decentralized critic, maintaining O(1) model complexity suitable for robotic swarms. Experiments demonstrate that PGF-MAPPO achieves superior capture efficiency against faster evaders. Policies trained on 10x10 maps exhibit robust zero-shot generalization to unseen 20x20 environments, significantly outperforming rule-based and learning-based baselines.

Paper Structure

This paper contains 38 sections, 9 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: The proposed PGF-MAPPO framework architecture. The system operates hierarchically: (Left) The Perception & State Module builds a local belief map and manages mode switching via an HFSM. (Middle) The Hierarchical Guidance Module generates high-level sub-goals using Directional Frontier Allocation (during exploration) or LKP tracking (during pursuit), and computes an A* path to provide topological guidance. (Right) The Shared RL Controller processes the augmented observation—including the A* guidance vector—to output continuous kinematic actions. Dense rewards derived from the A* potential field facilitate stable training.
  • Figure 2: Evolution of environmental complexity. (a) Training Environment ($10\times10$): A fixed layout with sparse obstacles used for all curriculum stages. (b) & (c) Evaluation Environments: Unseen $15\times15$ and $20\times20$ maps with randomly generated, dense obstacles. The transition from (a) to (c) represents a significant leap in difficulty, testing the zero-shot generalization capability of the PGF-MAPPO policy.
  • Figure 3: Quantitative comparison results. (a) Capture Success Rate... (b) Average Capture Time...
  • Figure 4: Map coverage rate over time steps across three scales. (b) and (c) show that our strategy (solid blue line) explores the environment significantly faster than baselines. The comparison with "w/o Angle Suppression" (purple dotted line) highlights the critical role of our spatial dispersion mechanism in large-scale maps.
  • Figure 5: Training curves of the core reward sum across curriculum stages (S1-S5). While all methods perform similarly in simple stages (S1-S2), the Euclidean Guidance baseline (Red) suffers a performance drop in complex stages (S4-S5). In contrast, our A* Guidance (Blue) maintains a stable upward trend, demonstrating robust convergence in cluttered environments.
  • ...and 2 more figures