Using reinforcement learning to improve drone-based inference of greenhouse gas fluxes

Alouette van Hove; Kristoffer Aalstad; Norbert Pirk

Using reinforcement learning to improve drone-based inference of greenhouse gas fluxes

Alouette van Hove, Kristoffer Aalstad, Norbert Pirk

TL;DR

This work addresses the challenge of mapping greenhouse gas flux hotspots at climate-model grid scales using drone observations. It combines Bayesian data assimilation with a Gaussian plume forward model and trains a tabular Q-learning agent to optimize sampling under battery constraints, comparing information-based rewards to an error-based metric. RL-trained drones achieve substantially tighter posterior estimates (e.g., $\text{CRPS}<6\ \mathrm{mg\,CO_2\,m^{-2}\,s^{-1}}$) than grid-path baselines (≈$20\ \mathrm{mg\,CO_2\,m^{-2}\,s^{-1}}$), demonstrating improved efficiency and accuracy for surface flux mapping. The framework supports robust field deployment and extension to more complex flux fields, with code available for replication and future scaling to neural network function-approximators.

Abstract

Accurate mapping of greenhouse gas fluxes at the Earth's surface is essential for the validation and calibration of climate models. In this study, we present a framework for surface flux estimation with drones. Our approach uses data assimilation (DA) to infer fluxes from drone-based observations, and reinforcement learning (RL) to optimize the drone's sampling strategy. Herein, we demonstrate that a RL-trained drone can quantify a CO2 hotspot more accurately than a drone sampling along a predefined flight path that traverses the emission plume. We find that information-based reward functions can match the performance of an error-based reward function that quantifies the difference between the estimated surface flux and the true value. Reward functions based on information gain and information entropy can motivate actions that increase the drone's confidence in its updated belief, without requiring knowledge of the true surface flux. These findings provide valuable insights for further development of the framework for the mapping of more complex surface flux fields.

Using reinforcement learning to improve drone-based inference of greenhouse gas fluxes

TL;DR

) than grid-path baselines (≈

), demonstrating improved efficiency and accuracy for surface flux mapping. The framework supports robust field deployment and extension to more complex flux fields, with code available for replication and future scaling to neural network function-approximators.

Abstract

Paper Structure (14 sections, 3 figures, 1 table)

This paper contains 14 sections, 3 figures, 1 table.

Introduction
Materials and methods
Bayesian data assimilation
Gaussian plume model
Synthetic observations
Reinforcement learning
Results
RL training convergence
Sampling strategy
Estimated posterior and final CRPS
Discussion
Conflict of interest
Resources
Acknowledgement

Figures (3)

Figure 1: Overview of the framework. Drone observations of gas concentration are fused with a Gaussian plume model. Through data assimilation (in orange) a more accurate estimate of the unobserved surface flux is inferred. A reinforcement learning algorithm (in green) is used to learn an optimal sampling policy for the positions of drone observations to reduce the model uncertainty as much as possible, and consequently increase the accuracy of the estimated surface flux.
Figure 2: The range normalized sum of rewards per episode for models trained with different reward functions. A moving average over 1,000 episodes is shown.
Figure 3: Comparison of flux strength estimation by drones trained with different reward functions: (a) $r = - \text{CRPS}$, (b) $r=D_{\text{KL}}$ and (c) $r=-H$, and a drone flying a grid path. Left: The unperturbed concentration field at $z = 10\,\text{m}$ for a flux of $250\,\text{mg}\,\text{CO}_2\,\text{m}^{-2}$$\text{s}^{-1}$ with $400 + 30/\sqrt{12}\,\text{ppm}$ isoline (dark grey), including the sampling paths of RL-trained drones starting upwind (orange), crosswind (green) and downwind (purple) from the hotspot at $\boldsymbol{\times}$, and a baseline grid path (light grey) with data collection at 16 locations (dots). Right: The prior, true flux and estimated posterior of 10 random individual flights. Differences between flights are a result of noisy observations.

Using reinforcement learning to improve drone-based inference of greenhouse gas fluxes

TL;DR

Abstract

Using reinforcement learning to improve drone-based inference of greenhouse gas fluxes

Authors

TL;DR

Abstract

Table of Contents

Figures (3)