Table of Contents
Fetching ...

Guiding drones by information gain

Alouette van Hove, Kristoffer Aalstad, Norbert Pirk

TL;DR

The paper tackles estimating unknown greenhouse gas source locations and fluxes from drone-based atmospheric data by casting STE as a Bayesian belief-MDP and comparing two IPP strategies: infotaxis and a deep reinforcement learning (DRL) policy guided by information gain. DRL is trained in a model-based setting using a value function $\hat{v}(s,\mathbf{w})$ and a reward $r=-H(s')$, with two neural architectures (fully connected and convolutional) and evaluated over $605$ source-term scenarios. Results show that DRL, particularly with a CNN, yields higher success rates and greater reductions in posterior entropy than infotaxis in non-isotropic plumes, though isotropic cases show similar performance. The work demonstrates that far-sighted planning via DRL can improve STE in drone-guided sensing and outlines directions for real-world deployment, time-dependent plume modeling, and extension to multiple sources.

Abstract

The accurate estimation of locations and emission rates of gas sources is crucial across various domains, including environmental monitoring and greenhouse gas emission analysis. This study investigates two drone sampling strategies for inferring source term parameters of gas plumes from atmospheric measurements. Both strategies are guided by the goal of maximizing information gain attained from observations at sequential locations. Our research compares the myopic approach of infotaxis to a far-sighted navigation strategy trained through deep reinforcement learning. We demonstrate the superior performance of deep reinforcement learning over infotaxis in environments with non-isotropic gas plumes.

Guiding drones by information gain

TL;DR

The paper tackles estimating unknown greenhouse gas source locations and fluxes from drone-based atmospheric data by casting STE as a Bayesian belief-MDP and comparing two IPP strategies: infotaxis and a deep reinforcement learning (DRL) policy guided by information gain. DRL is trained in a model-based setting using a value function and a reward , with two neural architectures (fully connected and convolutional) and evaluated over source-term scenarios. Results show that DRL, particularly with a CNN, yields higher success rates and greater reductions in posterior entropy than infotaxis in non-isotropic plumes, though isotropic cases show similar performance. The work demonstrates that far-sighted planning via DRL can improve STE in drone-guided sensing and outlines directions for real-world deployment, time-dependent plume modeling, and extension to multiple sources.

Abstract

The accurate estimation of locations and emission rates of gas sources is crucial across various domains, including environmental monitoring and greenhouse gas emission analysis. This study investigates two drone sampling strategies for inferring source term parameters of gas plumes from atmospheric measurements. Both strategies are guided by the goal of maximizing information gain attained from observations at sequential locations. Our research compares the myopic approach of infotaxis to a far-sighted navigation strategy trained through deep reinforcement learning. We demonstrate the superior performance of deep reinforcement learning over infotaxis in environments with non-isotropic gas plumes.
Paper Structure (13 sections, 12 equations, 3 figures, 1 table)

This paper contains 13 sections, 12 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Example of a hit map: (left half) map of the mean number of hits $\mu$ calculated by the analytic plume model Eq. \ref{['eq:plume_model']}, and (right half) a possible map of noisy sensor measurements $h$ from Eq. \ref{['eq:poisson']}.
  • Figure 2: Agent navigation in an environment with $\tilde{V} = 2$ and $\tilde{D} = 2$, and source term parameters $\tilde{\mathbf{x}}_{\mathrm{s}} = (9,1)$ and $\tilde{\phi} = 2$: (left) DRL and infotaxis sampling paths, and (right) (cumulative) entropy. Lower entropy indicates a reduced uncertainty in the estimation of the source term parameters.
  • Figure 3: Relative Discrete Ranked Probability Score (DRPS) for source location $\tilde{x}_{s}$ (top), $\tilde{y}_{s}$ (middle) and flux $\tilde{\phi}$ (bottom). Circle radii correspond to the median (dark shade) and 75$^{\textrm{th}}$ percentile (light shade) relative DRPS. DRL results are shown as (purple) left half circles and infotaxis results are shown as (green) right half circles.