Optimizing UAV-UGV Coalition Operations: A Hybrid Clustering and Multi-Agent Reinforcement Learning Approach for Path Planning in Obstructed Environment
Shamyo Brotee, Farhan Kabir, Md. Abdur Razzaque, Palash Roy, Md. Mamun-Or-Rashid, Md. Rafiul Hassan, Mohammad Mehedi Hassan
TL;DR
This work tackles efficient path planning for heterogeneous UAV-UGV coalitions in obstructed environments by introducing MEANCRFT, which uses a modified mean-shift clustering to partition targets into circular zones governed by zone radius $R$. The coalition training employs two MADRL frameworks, MADDPG and MAPPO, trained in two phases for UGVs and UAVs and then combined to achieve collision-free navigation while minimizing UAV and UGV travel lengths $F_a$ and $L_g$. Key contributions include the zone-based zoning heuristic, flexible coalition sizes, detailed reward structures, and comprehensive OpenAI Gym experiments showing substantial performance gains over baselines. The results indicate that the proposed zoning and dual MADRL approach enhances robustness and efficiency for real-world missions like post-disaster search and rescue where rapid, coordinated, multi-vehicle operations are critical.
Abstract
One of the most critical applications undertaken by coalitions of Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) is reaching predefined targets by following the most time-efficient routes while avoiding collisions. Unfortunately, UAVs are hampered by limited battery life, and UGVs face challenges in reachability due to obstacles and elevation variations. Existing literature primarily focuses on one-to-one coalitions, which constrains the efficiency of reaching targets. In this work, we introduce a novel approach for a UAV-UGV coalition with a variable number of vehicles, employing a modified mean-shift clustering algorithm to segment targets into multiple zones. Each vehicle utilizes Multi-agent Deep Deterministic Policy Gradient (MADDPG) and Multi-agent Proximal Policy Optimization (MAPPO), two advanced reinforcement learning algorithms, to form an effective coalition for navigating obstructed environments without collisions. This approach of assigning targets to various circular zones, based on density and range, significantly reduces the time required to reach these targets. Moreover, introducing variability in the number of UAVs and UGVs in a coalition enhances task efficiency by enabling simultaneous multi-target engagement. The results of our experimental evaluation demonstrate that our proposed method substantially surpasses current state-of-the-art techniques, nearly doubling efficiency in terms of target navigation time and task completion rate.
