Hierarchical Policy-Gradient Reinforcement Learning for Multi-Agent Shepherding Control of Non-Cohesive Targets

Stefano Covone; Italo Napolitano; Francesco De Lellis; Mario di Bernardo

Hierarchical Policy-Gradient Reinforcement Learning for Multi-Agent Shepherding Control of Non-Cohesive Targets

Stefano Covone, Italo Napolitano, Francesco De Lellis, Mario di Bernardo

TL;DR

This work tackles shepherding non-cohesive targets with multiple decentralized herders by introducing a hierarchical policy-gradient framework based on PPO and MAPPO. It learns both driving and target-selection policies in a fully model-free setting with continuous actions, training the driving component in a single-herder/single-target scenario and the target-selection component in multi-agent contexts. The approach demonstrates improved settling times and path efficiency over a model-based baseline, scales to larger target sets using topological sensing, and remains robust under parameter variations. The results have practical implications for real-world multi-robot shepherding and indirect-control problems, with future work targeting truly large-scale systems, heterogeneous agents, and physical-robot validation.

Abstract

We propose a decentralized reinforcement learning solution for multi-agent shepherding of non-cohesive targets using policy-gradient methods. Our architecture integrates target-selection with target-driving through Proximal Policy Optimization, overcoming discrete-action constraints of previous Deep Q-Network approaches and enabling smoother agent trajectories. This model-free framework effectively solves the shepherding problem without prior dynamics knowledge. Experiments demonstrate our method's effectiveness and scalability with increased target numbers and limited sensing capabilities.

Hierarchical Policy-Gradient Reinforcement Learning for Multi-Agent Shepherding Control of Non-Cohesive Targets

TL;DR

Abstract

Hierarchical Policy-Gradient Reinforcement Learning for Multi-Agent Shepherding Control of Non-Cohesive Targets

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)