Table of Contents
Fetching ...

DyPNIPP: Predicting Environment Dynamics for RL-based Robust Informative Path Planning

Srujan Deolasee, Siva Kailas, Wenhao Luo, Katia Sycara, Woojun Kim

TL;DR

DyPNIPP is proposed, a robust RL-based IPP framework, designed to operate effectively across spatio-temporal environments with varying dynamics, that incorporates domain randomization to train the agent across diverse environments and introduces a dynamics prediction model to capture and adapt the agent actions to specific environment dynamics.

Abstract

Informative path planning (IPP) is an important planning paradigm for various real-world robotic applications such as environment monitoring. IPP involves planning a path that can learn an accurate belief of the quantity of interest, while adhering to planning constraints. Traditional IPP methods typically require high computation time during execution, giving rise to reinforcement learning (RL) based IPP methods. However, the existing RL-based methods do not consider spatio-temporal environments which involve their own challenges due to variations in environment characteristics. In this paper, we propose DyPNIPP, a robust RL-based IPP framework, designed to operate effectively across spatio-temporal environments with varying dynamics. To achieve this, DyPNIPP incorporates domain randomization to train the agent across diverse environments and introduces a dynamics prediction model to capture and adapt the agent actions to specific environment dynamics. Our extensive experiments in a wildfire environment demonstrate that DyPNIPP outperforms existing RL-based IPP algorithms by significantly improving robustness and performing across diverse environment conditions.

DyPNIPP: Predicting Environment Dynamics for RL-based Robust Informative Path Planning

TL;DR

DyPNIPP is proposed, a robust RL-based IPP framework, designed to operate effectively across spatio-temporal environments with varying dynamics, that incorporates domain randomization to train the agent across diverse environments and introduces a dynamics prediction model to capture and adapt the agent actions to specific environment dynamics.

Abstract

Informative path planning (IPP) is an important planning paradigm for various real-world robotic applications such as environment monitoring. IPP involves planning a path that can learn an accurate belief of the quantity of interest, while adhering to planning constraints. Traditional IPP methods typically require high computation time during execution, giving rise to reinforcement learning (RL) based IPP methods. However, the existing RL-based methods do not consider spatio-temporal environments which involve their own challenges due to variations in environment characteristics. In this paper, we propose DyPNIPP, a robust RL-based IPP framework, designed to operate effectively across spatio-temporal environments with varying dynamics. To achieve this, DyPNIPP incorporates domain randomization to train the agent across diverse environments and introduces a dynamics prediction model to capture and adapt the agent actions to specific environment dynamics. Our extensive experiments in a wildfire environment demonstrate that DyPNIPP outperforms existing RL-based IPP algorithms by significantly improving robustness and performing across diverse environment conditions.

Paper Structure

This paper contains 20 sections, 2 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: The environment changes in two different dynamics: $F_c=1$ (first row) and $F_c=10$ (second row). $F_c$ is fuel coefficient in the FireCommander simulator seraj2020firecommander. Higher $F_c$ causes the fire to spread faster. The higher value (yellow) denotes a higher intensity of fire.
  • Figure 2: Overview of our approach: (Left) Environment modeling to build the input for the policy and domain randomization for the simulator. (Middle) The overall operation of the proposed IPP algorithm. (Right) Generating the next waypoint (action) using the RL policy and the proposed DPM (blue). DPM predicts the next environment state, and the environment dynamics feature is extracted from the hidden layer to serve as input to the RL policy.
  • Figure 3: t-SNE plot of environment-context feature. Fuel/vegetation coefficient: 1 (purple), 5 (green), 10 (yellow)
  • Figure 4: Experimental validation of DyPNIPP on the Khepara-IV robot. The left figures show the robot performing informative path planning in the arena by observing environmental phenomena at its location, with red areas indicating higher values. The right figures display the predicted environmental phenomena and ground truth, with time progressing from the first to the third row. The predicted mean improves as the robot explores more areas.