Factorized Deep Q-Network for Cooperative Multi-Agent Reinforcement Learning in Victim Tagging
Maria Ana Cardei, Afsaneh Doryab
TL;DR
This work addresses the challenge of minimizing victim tagging time in mass casualty incidents under uncertainty by formulating an ILP baseline and introducing five distributed heuristics that reflect varying communication capabilities. It then presents a Factorized Deep Q-Network (FDQN) MARL approach with a shared global state and decentralized actions, augmented by action masking to enable cooperative victim tagging. Through extensive simulations, local, uncertainty-aware heuristics consistently outperform global strategies, while FDQN demonstrates gains in smaller-scale scenarios but struggles as problem size grows, indicating complementary roles for learning and heuristics. Overall, the study provides actionable guidance for emergency response planning and highlights the potential and current limits of MARL in large-scale, real-time disaster response.
Abstract
Mass casualty incidents (MCIs) are a growing concern, characterized by complexity and uncertainty that demand adaptive decision-making strategies. The victim tagging step in the emergency medical response must be completed quickly and is crucial for providing information to guide subsequent time-constrained response actions. In this paper, we present a mathematical formulation of multi-agent victim tagging to minimize the time it takes for responders to tag all victims. Five distributed heuristics are formulated and evaluated with simulation experiments. The heuristics considered are on-the go, practical solutions that represent varying levels of situational uncertainty in the form of global or local communication capabilities, showcasing practical constraints. We further investigate the performance of a multi-agent reinforcement learning (MARL) strategy, factorized deep Q-network (FDQN), to minimize victim tagging time as compared to baseline heuristics. Extensive simulations demonstrate that between the heuristics, methods with local communication are more efficient for adaptive victim tagging, specifically choosing the nearest victim with the option to replan. Analyzing all experiments, we find that our FDQN approach outperforms heuristics in smaller-scale scenarios, while heuristics excel in more complex scenarios. Our experiments contain diverse complexities that explore the upper limits of MARL capabilities for real-world applications and reveal key insights.
