Coordinated Autonomous Drones for Human-Centered Fire Evacuation in Partially Observable Urban Environments
Maria G. Mendoza, Addison Kalanther, Daniel Bostwick, Emma Stephan, Chinmay Maheshwari, Shankar Sastry
TL;DR
This paper addresses real-time, human-centered fire evacuation in partially observable urban environments by coordinating two heterogeneous UAVs (HLR and LLR) to locate and guide evacuees under panic. It combines an agent-based panic model with a POMDP formulation and trains a centralized, recurrent policy via PPO to handle long-horizon planning and partial observability. The reward structure emphasizes visibility, proximity, and successful capture, and experiments demonstrate significant reductions in time to safety and robust performance across varied initial conditions, while highlighting remaining limitations and avenues for scaling to multiple evacuees. The work offers a practical, autonomous framework that could augment emergency response in low-resource or high-risk settings, informing deployment strategies for urban disaster relief.
Abstract
Autonomous drone technology holds significant promise for enhancing search and rescue operations during evacuations by guiding humans toward safety and supporting broader emergency response efforts. However, their application in dynamic, real-time evacuation support remains limited. Existing models often overlook the psychological and emotional complexity of human behavior under extreme stress. In real-world fire scenarios, evacuees frequently deviate from designated safe routes due to panic and uncertainty. To address these challenges, this paper presents a multi-agent coordination framework in which autonomous Unmanned Aerial Vehicles (UAVs) assist human evacuees in real-time by locating, intercepting, and guiding them to safety under uncertain conditions. We model the problem as a Partially Observable Markov Decision Process (POMDP), where two heterogeneous UAV agents, a high-level rescuer (HLR) and a low-level rescuer (LLR), coordinate through shared observations and complementary capabilities. Human behavior is captured using an agent-based model grounded in empirical psychology, where panic dynamically affects decision-making and movement in response to environmental stimuli. The environment features stochastic fire spread, unknown evacuee locations, and limited visibility, requiring UAVs to plan over long horizons to search for humans and adapt in real-time. Our framework employs the Proximal Policy Optimization (PPO) algorithm with recurrent policies to enable robust decision-making in partially observable settings. Simulation results demonstrate that the UAV team can rapidly locate and intercept evacuees, significantly reducing the time required for them to reach safety compared to scenarios without UAV assistance.
