Table of Contents
Fetching ...

A Graph-Based Reinforcement Learning Approach with Frontier Potential Based Reward for Safe Cluttered Environment Exploration

Gabriele Calzolari, Vidya Sumathy, Christoforos Kanellakis, George Nikolakopoulos

TL;DR

This work tackles safe, efficient exploration in cluttered environments by combining a graph neural network–based greedy exploration policy with a safety shield to enforce feasible actions. It introduces a graph-based observation framework and a frontier-based potential field reward that guides exploration toward informative frontiers while accounting for proximity to unknown regions. The approach uses PPO to train the GNN policy and a critic, achieving robust map expansion with limited safety shield interventions across varied obstacle configurations. The results demonstrate high map coverage with low reliance on safety interventions, suggesting practical potential for real-world UAV forest exploration with a high-level planner role.

Abstract

Autonomous exploration of cluttered environments requires efficient exploration strategies that guarantee safety against potential collisions with unknown random obstacles. This paper presents a novel approach combining a graph neural network-based exploration greedy policy with a safety shield to ensure safe navigation goal selection. The network is trained using reinforcement learning and the proximal policy optimization algorithm to maximize exploration efficiency while reducing the safety shield interventions. However, if the policy selects an infeasible action, the safety shield intervenes to choose the best feasible alternative, ensuring system consistency. Moreover, this paper proposes a reward function that includes a potential field based on the agent's proximity to unexplored regions and the expected information gain from reaching them. Overall, the approach investigated in this paper merges the benefits of the adaptability of reinforcement learning-driven exploration policies and the guarantee ensured by explicit safety mechanisms. Extensive evaluations in simulated environments demonstrate that the approach enables efficient and safe exploration in cluttered environments.

A Graph-Based Reinforcement Learning Approach with Frontier Potential Based Reward for Safe Cluttered Environment Exploration

TL;DR

This work tackles safe, efficient exploration in cluttered environments by combining a graph neural network–based greedy exploration policy with a safety shield to enforce feasible actions. It introduces a graph-based observation framework and a frontier-based potential field reward that guides exploration toward informative frontiers while accounting for proximity to unknown regions. The approach uses PPO to train the GNN policy and a critic, achieving robust map expansion with limited safety shield interventions across varied obstacle configurations. The results demonstrate high map coverage with low reliance on safety interventions, suggesting practical potential for real-world UAV forest exploration with a high-level planner role.

Abstract

Autonomous exploration of cluttered environments requires efficient exploration strategies that guarantee safety against potential collisions with unknown random obstacles. This paper presents a novel approach combining a graph neural network-based exploration greedy policy with a safety shield to ensure safe navigation goal selection. The network is trained using reinforcement learning and the proximal policy optimization algorithm to maximize exploration efficiency while reducing the safety shield interventions. However, if the policy selects an infeasible action, the safety shield intervenes to choose the best feasible alternative, ensuring system consistency. Moreover, this paper proposes a reward function that includes a potential field based on the agent's proximity to unexplored regions and the expected information gain from reaching them. Overall, the approach investigated in this paper merges the benefits of the adaptability of reinforcement learning-driven exploration policies and the guarantee ensured by explicit safety mechanisms. Extensive evaluations in simulated environments demonstrate that the approach enables efficient and safe exploration in cluttered environments.

Paper Structure

This paper contains 11 sections, 10 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of the proposed safe reinforcement learning framework for exploring unknown, cluttered arenas, using randomly generated occupancy maps to model tree trunk distribution on a plane parallel to the ground. The image highlights key elements required during training (yellow and gray) and execution (gray).
  • Figure 2: Illustration of the exploration graph extraction, used as observation $o_t$, representing the agent's exploration map. The image depicts a section of the known arena with occupied (black), unknown (gray), and free (white) cells. The agent's position, feasible and unfeasible next-step navigation goals, and frontiers are marked in blue, green, red, and orange, respectively. The graph structure captures node relationships, while the right side highlights a frontier’s local neighborhood and its extracted feature vector. Moreover, the graph's edge thickness is proportional to the distance between connected nodes.
  • Figure 3: From left to right: (a) Mean total reward curve during training, smoothed and scaled to $[0,1]$. (b) Map coverage percentage relative to traversable regions over agent steps in 1000 test environments. (c) Distribution of the percentages of safety shield interventions over agent actions across 1000 test simulations.
  • Figure 4: From left to right: (a)–(b) Map coverage percentage over agent steps in environments with varying sizes of the exploration arena and non-traversable regions. (c) Distribution of safety shield intervention ratio across agent actions for different tree densities.