Deep Reinforcement Learning-Based Approach for a Single Vehicle Persistent Surveillance Problem with Fuel Constraints

Manav Mishra; Hritik Bana; Saswata Sarkar; Sujeevraja Sanjeevi; PB Sujit; Kaarthik Sundar

Deep Reinforcement Learning-Based Approach for a Single Vehicle Persistent Surveillance Problem with Fuel Constraints

Manav Mishra, Hritik Bana, Saswata Sarkar, Sujeevraja Sanjeevi, PB Sujit, Kaarthik Sundar

TL;DR

This work tackles the Single Vehicle Persistent Surveillance with Fuel Constraints (SVPSFC), where a fuel-limited UAV must repeatedly visit $n$ targets from a depot while minimizing the maximum revisit time across targets. The authors formulate SVPSFC as an MDP and solve it with a PPO-based Deep Reinforcement Learning (DRL) approach, augmented with transferability via dummy targets and action masking to enforce fuel feasibility. Results show the DRL method consistently outperforms a greedy baseline across varying $n$ and fuel capacities, achieving lower maximum revisit times and 100% mission success due to action masking. The findings demonstrate practical potential for fuel-aware UAV persistent surveillance and suggest avenues for extending to prioritized targets and multi-vehicle coordination, enhancing real-world applicability.

Abstract

This article presents a deep reinforcement learning-based approach to tackle a persistent surveillance mission requiring a single unmanned aerial vehicle initially stationed at a depot with fuel or time-of-flight constraints to repeatedly visit a set of targets with equal priority. Owing to the vehicle's fuel or time-of-flight constraints, the vehicle must be regularly refueled, or its battery must be recharged at the depot. The objective of the problem is to determine an optimal sequence of visits to the targets that minimizes the maximum time elapsed between successive visits to any target while ensuring that the vehicle never runs out of fuel or charge. We present a deep reinforcement learning algorithm to solve this problem and present the results of numerical experiments that corroborate the effectiveness of this approach in comparison with common-sense greedy heuristics.

Deep Reinforcement Learning-Based Approach for a Single Vehicle Persistent Surveillance Problem with Fuel Constraints

TL;DR

This work tackles the Single Vehicle Persistent Surveillance with Fuel Constraints (SVPSFC), where a fuel-limited UAV must repeatedly visit

targets from a depot while minimizing the maximum revisit time across targets. The authors formulate SVPSFC as an MDP and solve it with a PPO-based Deep Reinforcement Learning (DRL) approach, augmented with transferability via dummy targets and action masking to enforce fuel feasibility. Results show the DRL method consistently outperforms a greedy baseline across varying

and fuel capacities, achieving lower maximum revisit times and 100% mission success due to action masking. The findings demonstrate practical potential for fuel-aware UAV persistent surveillance and suggest avenues for extending to prioritized targets and multi-vehicle coordination, enhancing real-world applicability.

Abstract

Paper Structure (14 sections, 5 equations, 4 figures)

This paper contains 14 sections, 5 equations, 4 figures.

Introduction
Related Work
Notations & Problem Definition
Methodology
MDP Formulation
Techniques to improve transferability
Action masking to enforce fuel restrictions
Experiments and Analysis
Experimental Setup
Results
Comparison of D-RL and the greedy baseline for increasing $n$
Comparison of D-RL and the greedy baseline for varying fuel capacity
Qualitative comparison of the trajectories
Conclusion & Future Work

Figures (4)

Figure 1: Environment with a UAV that visits the targets represented by the colored dots. Each target has an associated clock that shows the time elapsed since the last visit to that target. In this particular instance, the UAV is at the target colored in orange, and consequently, its clock reflects the time as zero. Once the vehicle reaches the target, it decides to visit some other target (grey-colored targets), in this case, located at a distance of $7$ units from the target colored in orange.
Figure 2: Performance comparison of maximum revisit time for the D-RL and greedy approach.
Figure 3: The average maximum revisit time for 100 test configurations is $n = 14$, while we vary the maximum fuel capacity.
Figure 4: (a) Outline of the environment configuration for a set of six targets located at $\{(0, 0), (2, 1), (0.5, 7), (7, 2), (8, 8), (5, 6), (4, 9)\}$. (b) The first 8 steps of the UAV's trajectory obtained using the greedy approach. (c) The first 8 steps of the UAV's trajectory obtained using the D-RL approach.

Deep Reinforcement Learning-Based Approach for a Single Vehicle Persistent Surveillance Problem with Fuel Constraints

TL;DR

Abstract

Deep Reinforcement Learning-Based Approach for a Single Vehicle Persistent Surveillance Problem with Fuel Constraints

Authors

TL;DR

Abstract

Table of Contents

Figures (4)