Decentralized Shepherding of Non-Cohesive Swarms Through Cluttered Environments via Deep Reinforcement Learning
Cristiana Punzo, Italo Napolitano, Cinzia Tomaselli, Mario di Bernardo
TL;DR
This paper tackles decentralized shepherding of non-cohesive targets in cluttered environments using a two-layer hierarchical approach. The low-level driving policy is learned with Proximal Policy Optimization (PPO) in a minimal 1H-1T-1 obstacle setting and then deployed to larger multi-agent scenarios without retraining, guided by a decentralized high-level target assignment. The method yields collision-free trajectories and robust convergence to the circular goal region $\Omega_G$, outperforming a vortex-based heuristic in 1H-1T tests and scaling to 10H-100T with three obstacles. The results demonstrate a scalable, model-free framework for indirect control in complex domains, with future work on safety guarantees and perception-based sensing.
Abstract
This paper investigates decentralized shepherding in cluttered environments, where a limited number of herders must guide a larger group of non-cohesive, diffusive targets toward a goal region in the presence of static obstacles. A hierarchical control architecture is proposed, integrating a high-level target assignment rule, where each herder is paired with a selected target, with a learning-based low-level driving module that enables effective steering of the assigned target. The low-level policy is trained in a one-herder-one-target scenario with a rectangular obstacle using Proximal Policy Optimization and then directly extended to multi-agent settings with multiple obstacles without requiring retraining. Numerical simulations demonstrate smooth, collision-free trajectories and consistent convergence to the goal region, highlighting the potential of reinforcement learning for scalable, model-free shepherding in complex environments.
