Guiding drones by information gain
Alouette van Hove, Kristoffer Aalstad, Norbert Pirk
TL;DR
The paper tackles estimating unknown greenhouse gas source locations and fluxes from drone-based atmospheric data by casting STE as a Bayesian belief-MDP and comparing two IPP strategies: infotaxis and a deep reinforcement learning (DRL) policy guided by information gain. DRL is trained in a model-based setting using a value function $\hat{v}(s,\mathbf{w})$ and a reward $r=-H(s')$, with two neural architectures (fully connected and convolutional) and evaluated over $605$ source-term scenarios. Results show that DRL, particularly with a CNN, yields higher success rates and greater reductions in posterior entropy than infotaxis in non-isotropic plumes, though isotropic cases show similar performance. The work demonstrates that far-sighted planning via DRL can improve STE in drone-guided sensing and outlines directions for real-world deployment, time-dependent plume modeling, and extension to multiple sources.
Abstract
The accurate estimation of locations and emission rates of gas sources is crucial across various domains, including environmental monitoring and greenhouse gas emission analysis. This study investigates two drone sampling strategies for inferring source term parameters of gas plumes from atmospheric measurements. Both strategies are guided by the goal of maximizing information gain attained from observations at sequential locations. Our research compares the myopic approach of infotaxis to a far-sighted navigation strategy trained through deep reinforcement learning. We demonstrate the superior performance of deep reinforcement learning over infotaxis in environments with non-isotropic gas plumes.
