Table of Contents
Fetching ...

Deep Reinforcement Learning with Dynamic Graphs for Adaptive Informative Path Planning

Apoorva Vashisth, Julius Rückin, Federico Magistri, Cyrill Stachniss, Marija Popović

TL;DR

This work tackles adaptive informative path planning for discovering targets in unknown $3D$ environments under resource constraints using a UAV. It proposes a deep reinforcement learning framework that integrates a dynamically constructed local action graph with a Gaussian-process model of action utilities, enabling online replanning while ensuring collision-free navigation. A novel reward blends exploration (variance-reduction via an $A$-optimal design criterion) and exploitation (target discovery) within an on-policy attention-based policy trained with proximal policy optimization. Empirical results in both orchard monitoring and photorealistic simulations demonstrate superior target discovery, efficient replanning, and practical applicability, with code and models openly available.

Abstract

Autonomous robots are often employed for data collection due to their efficiency and low labour costs. A key task in robotic data acquisition is planning paths through an initially unknown environment to collect observations given platform-specific resource constraints, such as limited battery life. Adaptive online path planning in 3D environments is challenging due to the large set of valid actions and the presence of unknown occlusions. To address these issues, we propose a novel deep reinforcement learning approach for adaptively replanning robot paths to map targets of interest in unknown 3D environments. A key aspect of our approach is a dynamically constructed graph that restricts planning actions local to the robot, allowing us to react to newly discovered static obstacles and targets of interest. For replanning, we propose a new reward function that balances between exploring the unknown environment and exploiting online-discovered targets of interest. Our experiments show that our method enables more efficient target discovery compared to state-of-the-art learning and non-learning baselines. We also showcase our approach for orchard monitoring using an unmanned aerial vehicle in a photorealistic simulator. We open-source our code and model at: https://github.com/dmar-bonn/ipp-rl-3d.

Deep Reinforcement Learning with Dynamic Graphs for Adaptive Informative Path Planning

TL;DR

This work tackles adaptive informative path planning for discovering targets in unknown environments under resource constraints using a UAV. It proposes a deep reinforcement learning framework that integrates a dynamically constructed local action graph with a Gaussian-process model of action utilities, enabling online replanning while ensuring collision-free navigation. A novel reward blends exploration (variance-reduction via an -optimal design criterion) and exploitation (target discovery) within an on-policy attention-based policy trained with proximal policy optimization. Empirical results in both orchard monitoring and photorealistic simulations demonstrate superior target discovery, efficient replanning, and practical applicability, with code and models openly available.

Abstract

Autonomous robots are often employed for data collection due to their efficiency and low labour costs. A key task in robotic data acquisition is planning paths through an initially unknown environment to collect observations given platform-specific resource constraints, such as limited battery life. Adaptive online path planning in 3D environments is challenging due to the large set of valid actions and the presence of unknown occlusions. To address these issues, we propose a novel deep reinforcement learning approach for adaptively replanning robot paths to map targets of interest in unknown 3D environments. A key aspect of our approach is a dynamically constructed graph that restricts planning actions local to the robot, allowing us to react to newly discovered static obstacles and targets of interest. For replanning, we propose a new reward function that balances between exploring the unknown environment and exploiting online-discovered targets of interest. Our experiments show that our method enables more efficient target discovery compared to state-of-the-art learning and non-learning baselines. We also showcase our approach for orchard monitoring using an unmanned aerial vehicle in a photorealistic simulator. We open-source our code and model at: https://github.com/dmar-bonn/ipp-rl-3d.
Paper Structure (13 sections, 7 equations, 6 figures, 3 tables)

This paper contains 13 sections, 7 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Our reinforcement learning approach for adaptive informative path planning applied in an orchard monitoring scenario using an unmanned aerial vehicle (UAV). Blue squares are candidate waypoints output by our planner, while the green square is the chosen next waypoint to visit. The inset windows show the onboard camera view and semantic segmentation for discovering apples. By planning collision-free paths for the UAV online, we maximise the number of apple fruits discovered under flight-length constraints.
  • Figure 2: At each mission timestep $t$, our approach samples collision-free waypoints in the robot's local environment. These waypoints, with considered yaw directions, generate action nodes. Each action node is associated with utility value and uncertainty of the utility value, regressed from the Gaussian process, to generate the dynamic graph. Our actor-critic network uses the dynamic graph to output the robot's state value and predicts the next action to execute, which generates a reward and observations from the environment. Blue arrows indicate the robot control loop and green indicate variables stored in the experience buffer to train the actor-critic network via on-policy learning.
  • Figure 3: Examples of testing and training orchard environments used in our experiments. Left: Testing environment with trees placed at random locations. Right: Training environment with trees placed in a regular square array. Thin and dark blue lines represent tree outlines and tree bases, respectively. Green stars indicate fruits.
  • Figure 4: Comparison of our approach against baselines in a UAV-based fruit monitoring scenario. Solid lines indicate means over 500 trials and shaded regions show standard deviations. In our approach, using our exploration-exploitation reward function with a dynamic graph action space for reinforcement learning enables more efficiently discovering targets of interest (fruit) during a mission.
  • Figure 5: Comparison of paths planned by (a) the global graph-based CAtNIPP cao2023catnipp baseline and (b) our dynamic graph-based reinforcement learning approach with $K=20$ in a fruit monitoring scenario. The blue line shows the executed UAV path, with the brown and pink circles indicating start and end positions. Red dots are targets not yet observed, while green stars are observed targets.
  • ...and 1 more figures