Table of Contents
Fetching ...

A Scalable Decentralized Reinforcement Learning Framework for UAV Target Localization Using Recurrent PPO

Leon Fernando, Billy Pik Lik Lau, Chau Yuen, U-Xuan Tan

TL;DR

This work addresses target localization for UAVs in GNSS-denied, perceptually degraded environments by employing a Recurrent PPO framework that incorporates an LSTM to handle partial observability. It evaluates both a single-drone and a decentralized two-drone swarm using a shared grid map and two sensor modalities, demonstrating high success in the single-drone case ($93\%$) and notable efficiency gains in the two-drone setup ($86\%$ with fewer steps). The approach leverages an eight-direction action space, a reward structure balancing exploration with penalties, and a carefully designed observation space, showing strong generalization across diverse indoor maps and scalable swarm coordination. The results underscore the potential of UAV swarms for robust, efficient target localization in GPS-denied scenarios, with practical implications for SEM, disaster response, and SAR tasks.

Abstract

The rapid advancements in unmanned aerial vehicles (UAVs) have unlocked numerous applications, including environmental monitoring, disaster response, and agricultural surveying. Enhancing the collective behavior of multiple decentralized UAVs can significantly improve these applications through more efficient and coordinated operations. In this study, we explore a Recurrent PPO model for target localization in perceptually degraded environments like places without GNSS/GPS signals. We first developed a single-drone approach for target identification, followed by a decentralized two-drone model. Our approach can utilize two types of sensors on the UAVs, a detection sensor and a target signal sensor. The single-drone model achieved an accuracy of 93%, while the two-drone model achieved an accuracy of 86%, with the latter requiring fewer average steps to locate the target. This demonstrates the potential of our method in UAV swarms, offering efficient and effective localization of radiant targets in complex environmental conditions.

A Scalable Decentralized Reinforcement Learning Framework for UAV Target Localization Using Recurrent PPO

TL;DR

This work addresses target localization for UAVs in GNSS-denied, perceptually degraded environments by employing a Recurrent PPO framework that incorporates an LSTM to handle partial observability. It evaluates both a single-drone and a decentralized two-drone swarm using a shared grid map and two sensor modalities, demonstrating high success in the single-drone case () and notable efficiency gains in the two-drone setup ( with fewer steps). The approach leverages an eight-direction action space, a reward structure balancing exploration with penalties, and a carefully designed observation space, showing strong generalization across diverse indoor maps and scalable swarm coordination. The results underscore the potential of UAV swarms for robust, efficient target localization in GPS-denied scenarios, with practical implications for SEM, disaster response, and SAR tasks.

Abstract

The rapid advancements in unmanned aerial vehicles (UAVs) have unlocked numerous applications, including environmental monitoring, disaster response, and agricultural surveying. Enhancing the collective behavior of multiple decentralized UAVs can significantly improve these applications through more efficient and coordinated operations. In this study, we explore a Recurrent PPO model for target localization in perceptually degraded environments like places without GNSS/GPS signals. We first developed a single-drone approach for target identification, followed by a decentralized two-drone model. Our approach can utilize two types of sensors on the UAVs, a detection sensor and a target signal sensor. The single-drone model achieved an accuracy of 93%, while the two-drone model achieved an accuracy of 86%, with the latter requiring fewer average steps to locate the target. This demonstrates the potential of our method in UAV swarms, offering efficient and effective localization of radiant targets in complex environmental conditions.

Paper Structure

This paper contains 10 sections, 5 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Grid map representation of the Single drone simulation environment. The unexplored cells are in state 0, the explored once cells in state 2 and yellow, explored twice cells in state 3 and green, explored thrice or more cells in state 4 and pink. The obstacles are represented in black, in state 1.
  • Figure 2: Model performance comparison
  • Figure 3: Proposed model architecture
  • Figure 4: Training maps for simulation environments.
  • Figure 5: Evaluation graph - single drone simulation environment.
  • ...and 1 more figures