PyroTrack: Belief-Based Deep Reinforcement Learning Path Planning for Aerial Wildfire Monitoring in Partially Observable Environments

Sahand Khoshdel; Qi Luo; Fatemeh Afghah

PyroTrack: Belief-Based Deep Reinforcement Learning Path Planning for Aerial Wildfire Monitoring in Partially Observable Environments

Sahand Khoshdel, Qi Luo, Fatemeh Afghah

TL;DR

PyroTrack presents a belief-based Deep Reinforcement Learning approach for UAV frontline tracking of wildfires in partially observable environments. It formulates the mission as a POMDP and uses a belief state updated via Bayesian/Beta-Binomial methods to compensate for limited field-of-view observations, coupled with a certainty map to handle information aging. The method employs a two-phase scanning-then-tracking strategy and a dual-input Deep Q-Network that fuses belief/observation information with UAV state, demonstrating improved frontline monitoring in dynamic fire scenarios. Results indicate that belief-based representations enhance coverage and frontline tracking under wind and vegetation variability, while maintaining performance in static settings, suggesting practical benefits for UAV wildfire management under real-world constraints. The work advances single-agent UAV wildfire monitoring by integrating vegetation/wind dynamics, power limits, and uncertainty age-of-information, with clear pathways to multi-agent extensions.

Abstract

Motivated by agility, 3D mobility, and low-risk operation compared to human-operated management systems of autonomous unmanned aerial vehicles (UAVs), this work studies UAV-based active wildfire monitoring where a UAV detects fire incidents in remote areas and tracks the fire frontline. A UAV path planning solution is proposed considering realistic wildfire management missions, where a single low-altitude drone with limited power and flight time is available. Noting the limited field of view of commercial low-altitude UAVs, the problem formulates as a partially observable Markov decision process (POMDP), in which wildfire progression outside the field of view causes inaccurate state representation that prevents the UAV from finding the optimal path to track the fire front in limited time. Common deep reinforcement learning (DRL)-based trajectory planning solutions require diverse drone-recorded wildfire data to generalize pre-trained models to real-time systems, which is not currently available at a diverse and standard scale. To narrow down the gap caused by partial observability in the space of possible policies, a belief-based state representation with broad, extensive simulated data is proposed where the beliefs (i.e., ignition probabilities of different grid areas) are updated using a Bayesian framework for the cells within the field of view. The performance of the proposed solution in terms of the ratio of detected fire cells and monitored ignited area (MIA) is evaluated in a complex fire scenario with multiple rapidly growing fire batches, indicating that the belief state representation outperforms the observation state representation both in fire coverage and the distance to fire frontline.

PyroTrack: Belief-Based Deep Reinforcement Learning Path Planning for Aerial Wildfire Monitoring in Partially Observable Environments

TL;DR

Abstract

Paper Structure (24 sections, 19 equations, 6 figures, 3 tables)

This paper contains 24 sections, 19 equations, 6 figures, 3 tables.

INTRODUCTION
Related Works
System Model
Forest Wildfire Model
Environment State Parameters
Environment State Initialization
Environment State Dynamics
Agent Model
State Space
Observation Space
Action Space
Reward Function
Proposed Method
Mission Phases
Scanning Phase
...and 9 more sections

Figures (6)

Figure 1: Sample Initialization of Wind Magnitude for $N = 30,\;N_{ign} = 10,\;A_{max} = 100,\;\epsilon_{rad} = 3$. $N_{ign}$ represents the number of initial ignitions. Maps: (Left: Ignition State, Middle: Distance from Fire, Right: Wind Magnitude)
Figure 2: The effect of fuel density and type on fuel consumption shown in a sample spread scenario. The denser vegetation patches have a higher initial fuel leading to a later burnout.
Figure 3: UAV model for fire-frontline tracking. The valid deviation denotes the acceptable action space which may differ from the action with highest value, thus the most optimal action in the valid range is selected.
Figure 4: Trajectories of the UAV in a static environment setting for episodes 5, 15, 25 from left to right. Burnt cells are shown in black. (Trajectory is plotted over the final burnt-out wildfire)
Figure 5: Bridging over burnt cells - Trajectories of the UAV in a dynamic environment for episodes 5 and 10 from left to right. Burnt cells are shown in black, while ignited cells are shown in red.
...and 1 more figures

PyroTrack: Belief-Based Deep Reinforcement Learning Path Planning for Aerial Wildfire Monitoring in Partially Observable Environments

TL;DR

Abstract

PyroTrack: Belief-Based Deep Reinforcement Learning Path Planning for Aerial Wildfire Monitoring in Partially Observable Environments

Authors

TL;DR

Abstract

Table of Contents

Figures (6)