Deep Reinforcement Learning for Time-Critical Wilderness Search And Rescue Using Drones

Jan-Hendrik Ewers; David Anderson; Douglas Thomson

Deep Reinforcement Learning for Time-Critical Wilderness Search And Rescue Using Drones

Jan-Hendrik Ewers, David Anderson, Douglas Thomson

TL;DR

This work tackles time-critical wilderness search and rescue by enabling drones to learn efficient search paths that maximize rapid detection of a missing person. It introduces SAC-FS-CNN, a soft-actor-critic based agent that operates with a continuous probabilistic search map (PDM) represented as a Gaussian mixture and a continuous action space realized via cubature integration. The method demonstrates substantial improvements over traditional lawnmower and LHC_GW_CONV strategies in probability efficiency over distance, as well as reductions in distance-to-find and higher detection rates in simulation. While promising for enhancing WiSAR effectiveness, the study notes the need for real-world validation, ethical considerations, and robustness to more complex terrains and sensor models. Overall, the paper provides a rigorous framework for integrating a priori probability information into DRL-driven UAV search missions with notable performance gains in simulated settings.

Abstract

Traditional search and rescue methods in wilderness areas can be time-consuming and have limited coverage. Drones offer a faster and more flexible solution, but optimizing their search paths is crucial. This paper explores the use of deep reinforcement learning to create efficient search missions for drones in wilderness environments. Our approach leverages a priori data about the search area and the missing person in the form of a probability distribution map. This allows the deep reinforcement learning agent to learn optimal flight paths that maximize the probability of finding the missing person quickly. Experimental results show that our method achieves a significant improvement in search times compared to traditional coverage planning and search planning algorithms. In one comparison, deep reinforcement learning is found to outperform other algorithms by over $160\%$, a difference that can mean life or death in real-world search operations. Additionally, unlike previous work, our approach incorporates a continuous action space enabled by cubature, allowing for more nuanced flight patterns.

Deep Reinforcement Learning for Time-Critical Wilderness Search And Rescue Using Drones

TL;DR

Abstract

, a difference that can mean life or death in real-world search operations. Additionally, unlike previous work, our approach incorporates a continuous action space enabled by cubature, allowing for more nuanced flight patterns.

Paper Structure (17 sections, 11 equations, 11 figures, 6 tables)

This paper contains 17 sections, 11 equations, 11 figures, 6 tables.

Introduction
Related Work
Method
Modelling
Environment
PDM
Reward
Training Algorithm
Policy Architecture
Results
Experimental setup
Probability Over Distance (POD)
Distance To Find (DTF) and Percentage Found (PF)
Limitations
Conclusion
...and 2 more sections

Figures (11)

Figure 1: An example multi modal bivariate Gaussian PDM
Figure 2: Visualizations of concepts related to the buffered polygon representation of the seen area.
Figure 3: The anatomy of the area calculation of an isolated step of length $\lambda$ with buffer ${R_\textit{buffer}}$ as used in \ref{['eqn:k']}.
Figure 4: Top-level representation of a typical reinforcement learning data flow. The agent is also commonly referred to as the policy
Figure 5: Policy network architecture
...and 6 more figures

Deep Reinforcement Learning for Time-Critical Wilderness Search And Rescue Using Drones

TL;DR

Abstract

Deep Reinforcement Learning for Time-Critical Wilderness Search And Rescue Using Drones

Authors

TL;DR

Abstract

Table of Contents

Figures (11)