Adaptive AUV Hunting Policy with Covert Communication via Diffusion Model
Xu Guo, Xiangwang Hou, Minrui Xu, Jianrui Chen, Jingjing Wang, Jun Du, Yong Ren
TL;DR
Addressing covert eavesdropping in cooperative AUV hunting, the paper formulates a covert-communication–constrained target-hunting problem and introduces AMADP, an offline MARL algorithm that uses diffusion models to generate diverse hunter trajectories under covert constraints. The method combines a Markov game framework with a diffusion-based policy generator and an adaptive attention mechanism to coordinate three AUVs around a target. The framework explicitly models underwater acoustic channels, block-fading covert channels, and a KL-divergence constraint to prevent eavesdropping while optimizing encirclement success. Experimental results show that AMADP converges faster and achieves higher hunting success rates than state-of-the-art offline MARL baselines while satisfying the covert constraint.
Abstract
Collaborative underwater target hunting, facilitated by multiple autonomous underwater vehicles (AUVs), plays a significant role in various domains, especially military missions. Existing research predominantly focuses on designing efficient and high-success-rate hunting policy, particularly addressing the target's evasion capabilities. However, in real-world scenarios, the target can not only adjust its evasion policy based on its observations and predictions but also possess eavesdropping capabilities. If communication among hunter AUVs, such as hunting policy exchanges, is intercepted by the target, it can adapt its escape policy accordingly, significantly reducing the success rate of the hunting mission. To address this challenge, we propose a covert communication-guaranteed collaborative target hunting framework, which ensures efficient hunting in complex underwater environments while defending against the target's eavesdropping. To the best of our knowledge, this is the first study to incorporate the confidentiality of inter-agent communication into the design of target hunting policy. Furthermore, given the complexity of coordinating multiple AUVs in dynamic and unpredictable environments, we propose an adaptive multi-agent diffusion policy (AMADP), which incorporates the strong generative ability of diffusion models into the multi-agent reinforcement learning (MARL) algorithm. Experimental results demonstrate that AMADP achieves faster convergence and higher hunting success rates while maintaining covertness constraints.
