Optimizing UAV-UGV Coalition Operations: A Hybrid Clustering and Multi-Agent Reinforcement Learning Approach for Path Planning in Obstructed Environment

Shamyo Brotee; Farhan Kabir; Md. Abdur Razzaque; Palash Roy; Md. Mamun-Or-Rashid; Md. Rafiul Hassan; Mohammad Mehedi Hassan

Optimizing UAV-UGV Coalition Operations: A Hybrid Clustering and Multi-Agent Reinforcement Learning Approach for Path Planning in Obstructed Environment

Shamyo Brotee, Farhan Kabir, Md. Abdur Razzaque, Palash Roy, Md. Mamun-Or-Rashid, Md. Rafiul Hassan, Mohammad Mehedi Hassan

TL;DR

This work tackles efficient path planning for heterogeneous UAV-UGV coalitions in obstructed environments by introducing MEANCRFT, which uses a modified mean-shift clustering to partition targets into circular zones governed by zone radius $R$. The coalition training employs two MADRL frameworks, MADDPG and MAPPO, trained in two phases for UGVs and UAVs and then combined to achieve collision-free navigation while minimizing UAV and UGV travel lengths $F_a$ and $L_g$. Key contributions include the zone-based zoning heuristic, flexible coalition sizes, detailed reward structures, and comprehensive OpenAI Gym experiments showing substantial performance gains over baselines. The results indicate that the proposed zoning and dual MADRL approach enhances robustness and efficiency for real-world missions like post-disaster search and rescue where rapid, coordinated, multi-vehicle operations are critical.

Abstract

One of the most critical applications undertaken by coalitions of Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) is reaching predefined targets by following the most time-efficient routes while avoiding collisions. Unfortunately, UAVs are hampered by limited battery life, and UGVs face challenges in reachability due to obstacles and elevation variations. Existing literature primarily focuses on one-to-one coalitions, which constrains the efficiency of reaching targets. In this work, we introduce a novel approach for a UAV-UGV coalition with a variable number of vehicles, employing a modified mean-shift clustering algorithm to segment targets into multiple zones. Each vehicle utilizes Multi-agent Deep Deterministic Policy Gradient (MADDPG) and Multi-agent Proximal Policy Optimization (MAPPO), two advanced reinforcement learning algorithms, to form an effective coalition for navigating obstructed environments without collisions. This approach of assigning targets to various circular zones, based on density and range, significantly reduces the time required to reach these targets. Moreover, introducing variability in the number of UAVs and UGVs in a coalition enhances task efficiency by enabling simultaneous multi-target engagement. The results of our experimental evaluation demonstrate that our proposed method substantially surpasses current state-of-the-art techniques, nearly doubling efficiency in terms of target navigation time and task completion rate.

Optimizing UAV-UGV Coalition Operations: A Hybrid Clustering and Multi-Agent Reinforcement Learning Approach for Path Planning in Obstructed Environment

TL;DR

. The coalition training employs two MADRL frameworks, MADDPG and MAPPO, trained in two phases for UGVs and UAVs and then combined to achieve collision-free navigation while minimizing UAV and UGV travel lengths

and

. Key contributions include the zone-based zoning heuristic, flexible coalition sizes, detailed reward structures, and comprehensive OpenAI Gym experiments showing substantial performance gains over baselines. The results indicate that the proposed zoning and dual MADRL approach enhances robustness and efficiency for real-world missions like post-disaster search and rescue where rapid, coordinated, multi-vehicle operations are critical.

Abstract

Paper Structure (23 sections, 15 equations, 6 figures, 3 tables, 4 algorithms)

This paper contains 23 sections, 15 equations, 6 figures, 3 tables, 4 algorithms.

Introduction
Related Work
System Model of MEANCRFT
Design Details of MEANCRFT
Zone Division: Assignment of Targets into Zones
Mean-Shift Clustering
Modified Mean-Shift Clustering
Assignment of Coalition to Zones
Constraints for MADRL Training
MADDPG Framework
MADDPG Reward Calculation
MAPPO Framework
MAPPO Reward Calculation
Experimental Evaluation
Simulation Platform
...and 8 more sections

Figures (6)

Figure 1: Visual representation of the relationships between key elements
Figure 2: Flowchart of the proposed solution framework
Figure 5: Average score per 1000 episodes for UGVs (MADDPG and MAPPO)
Figure 6: Impacts of varying the number of targets
Figure 7: Impacts of varying the combination of UAV-UGV coalitions
...and 1 more figures

Optimizing UAV-UGV Coalition Operations: A Hybrid Clustering and Multi-Agent Reinforcement Learning Approach for Path Planning in Obstructed Environment

TL;DR

Abstract

Optimizing UAV-UGV Coalition Operations: A Hybrid Clustering and Multi-Agent Reinforcement Learning Approach for Path Planning in Obstructed Environment

Authors

TL;DR

Abstract

Table of Contents

Figures (6)