Hierarchical Policy-Gradient Reinforcement Learning for Multi-Agent Shepherding Control of Non-Cohesive Targets
Stefano Covone, Italo Napolitano, Francesco De Lellis, Mario di Bernardo
TL;DR
This work tackles shepherding non-cohesive targets with multiple decentralized herders by introducing a hierarchical policy-gradient framework based on PPO and MAPPO. It learns both driving and target-selection policies in a fully model-free setting with continuous actions, training the driving component in a single-herder/single-target scenario and the target-selection component in multi-agent contexts. The approach demonstrates improved settling times and path efficiency over a model-based baseline, scales to larger target sets using topological sensing, and remains robust under parameter variations. The results have practical implications for real-world multi-robot shepherding and indirect-control problems, with future work targeting truly large-scale systems, heterogeneous agents, and physical-robot validation.
Abstract
We propose a decentralized reinforcement learning solution for multi-agent shepherding of non-cohesive targets using policy-gradient methods. Our architecture integrates target-selection with target-driving through Proximal Policy Optimization, overcoming discrete-action constraints of previous Deep Q-Network approaches and enabling smoother agent trajectories. This model-free framework effectively solves the shepherding problem without prior dynamics knowledge. Experiments demonstrate our method's effectiveness and scalability with increased target numbers and limited sensing capabilities.
