Navigating the Smog: A Cooperative Multi-Agent RL for Accurate Air Pollution Mapping through Data Assimilation
Ichrak Mokhtari, Walid Bechkit, Mohamed Sami Assenine, Hervé Rivano
TL;DR
The paper tackles real-time air pollution mapping by optimizing UAV measurement locations to enhance Data Assimilation (DA) without ground-truth data. It introduces a cooperative Multi-Agent Reinforcement Learning framework where autonomous drones act as agents in a Markov Game, using independent Q-learning with shared state and team rewards, and two credit assignment schemes: equal split and difference rewards; the latter is implemented via a C51 distributional DQN. The DA objective is expressed through a team reward $R_t = \mathbb{E}[x^{a}_{t-1} - x^{a}_{t}]$, guiding drones to collect informative measurements and accelerate convergence of $x^{a}$ toward the true state, with BLUE-based modeling and a realistic observation structure. Experiments on the FFT07 real-world dataset demonstrate substantial DA improvements, with the difference-rewards MARL achieving performance close to ground-truth-guided baselines and robust behavior under varying budgets and simulation quality. The work highlights scalable autonomous drone coordination for accurate pollution mapping and points to extensions to unsteady dispersion and other environmental monitoring challenges.
Abstract
The rapid rise of air pollution events necessitates accurate, real-time monitoring for informed mitigation strategies. Data Assimilation (DA) methods provide promising solutions, but their effectiveness hinges heavily on optimal measurement locations. This paper presents a novel approach for air quality mapping where autonomous drones, guided by a collaborative multi-agent reinforcement learning (MARL) framework, act as airborne detectives. Ditching the limitations of static sensor networks, the drones engage in a synergistic interaction, adapting their flight paths in real time to gather optimal data for Data Assimilation (DA). Our approach employs a tailored reward function with dynamic credit assignment, enabling drones to prioritize informative measurements without requiring unavailable ground truth data, making it practical for real-world deployments. Extensive experiments using a real-world dataset demonstrate that our solution achieves significantly improved pollution estimates, even with limited drone resources or limited prior knowledge of the pollution plume. Beyond air quality, this solution unlocks possibilities for tackling diverse environmental challenges like wildfire detection and management through scalable and autonomous drone cooperation.
