Using reinforcement learning to improve drone-based inference of greenhouse gas fluxes
Alouette van Hove, Kristoffer Aalstad, Norbert Pirk
TL;DR
This work addresses the challenge of mapping greenhouse gas flux hotspots at climate-model grid scales using drone observations. It combines Bayesian data assimilation with a Gaussian plume forward model and trains a tabular Q-learning agent to optimize sampling under battery constraints, comparing information-based rewards to an error-based metric. RL-trained drones achieve substantially tighter posterior estimates (e.g., $\text{CRPS}<6\ \mathrm{mg\,CO_2\,m^{-2}\,s^{-1}}$) than grid-path baselines (≈$20\ \mathrm{mg\,CO_2\,m^{-2}\,s^{-1}}$), demonstrating improved efficiency and accuracy for surface flux mapping. The framework supports robust field deployment and extension to more complex flux fields, with code available for replication and future scaling to neural network function-approximators.
Abstract
Accurate mapping of greenhouse gas fluxes at the Earth's surface is essential for the validation and calibration of climate models. In this study, we present a framework for surface flux estimation with drones. Our approach uses data assimilation (DA) to infer fluxes from drone-based observations, and reinforcement learning (RL) to optimize the drone's sampling strategy. Herein, we demonstrate that a RL-trained drone can quantify a CO2 hotspot more accurately than a drone sampling along a predefined flight path that traverses the emission plume. We find that information-based reward functions can match the performance of an error-based reward function that quantifies the difference between the estimated surface flux and the true value. Reward functions based on information gain and information entropy can motivate actions that increase the drone's confidence in its updated belief, without requiring knowledge of the true surface flux. These findings provide valuable insights for further development of the framework for the mapping of more complex surface flux fields.
