ARBoids: Adaptive Residual Reinforcement Learning With Boids Model for Cooperative Multi-USV Target Defense
Jiyue Tao, Tongsheng Shen, Dexin Zhao, Feitian Zhang
TL;DR
ARBoids addresses the challenged target defense problem for USVs when attackers are more agile by marrying a Boids-based baseline with a learnable residual policy. The method introduces a state-dependent adapter that blends a DRL policy with Boids, trained via Soft Actor-Critic under CTDE, and uses curriculum learning to progressively confront stronger attackers. Empirical results in high-fidelity Gazebo simulations show that ARBoids outperforms pure Boids, residual, and vanilla DRL baselines, with strong robustness to attacker agility and generalization to unseen team sizes. The approach demonstrates practical benefits for cooperative usher defenses, offering improved interception success and scalable coordination with potential for real-world deployment and extension to adversarial learning settings.
Abstract
The target defense problem (TDP) for unmanned surface vehicles (USVs) concerns intercepting an adversarial USV before it breaches a designated target region, using one or more defending USVs. A particularly challenging scenario arises when the attacker exhibits superior maneuverability compared to the defenders, significantly complicating effective interception. To tackle this challenge, this letter introduces ARBoids, a novel adaptive residual reinforcement learning framework that integrates deep reinforcement learning (DRL) with the biologically inspired, force-based Boids model. Within this framework, the Boids model serves as a computationally efficient baseline policy for multi-agent coordination, while DRL learns a residual policy to adaptively refine and optimize the defenders' actions. The proposed approach is validated in a high-fidelity Gazebo simulation environment, demonstrating superior performance over traditional interception strategies, including pure force-based approaches and vanilla DRL policies. Furthermore, the learned policy exhibits strong adaptability to attackers with diverse maneuverability profiles, highlighting its robustness and generalization capability. The code of ARBoids will be released upon acceptance of this letter.
