Platform-Agnostic Reinforcement Learning Framework for Safe Exploration of Cluttered Environments with Graph Attention

Gabriele Calzolari; Vidya Sumathy; Christoforos Kanellakis; George Nikolakopoulos

Platform-Agnostic Reinforcement Learning Framework for Safe Exploration of Cluttered Environments with Graph Attention

Gabriele Calzolari, Vidya Sumathy, Christoforos Kanellakis, George Nikolakopoulos

TL;DR

Problem: safe, efficient exploration in cluttered environments requires guarantees against collisions. Approach: a platform-agnostic hierarchical reinforcement learning framework combines a graph neural network policy for next-waypoint selection with a safety filter, trained using Proximal Policy Optimization (PPO) and augmented by a frontier- and potential-field-inspired reward. Contributions: (1) a GNN-based exploration policy with attention, (2) a safety filter that overrides infeasible actions with the closest feasible one, (3) a frontier-guided reward design that balances exploration gains and safety, and (4) validation in both simulation and real-world lab experiments demonstrating robust performance. Significance: demonstrates practical deployment potential of learning-based exploration on robotic platforms with explicit safety guarantees in cluttered spaces.

Abstract

Autonomous exploration of obstacle-rich spaces requires strategies that ensure efficiency while guaranteeing safety against collisions with obstacles. This paper investigates a novel platform-agnostic reinforcement learning framework that integrates a graph neural network-based policy for next-waypoint selection, with a safety filter ensuring safe mobility. Specifically, the neural network is trained using reinforcement learning through the Proximal Policy Optimization (PPO) algorithm to maximize exploration efficiency while minimizing safety filter interventions. Henceforth, when the policy proposes an infeasible action, the safety filter overrides it with the closest feasible alternative, ensuring consistent system behavior. In addition, this paper introduces a reward function shaped by a potential field that accounts for both the agent's proximity to unexplored regions and the expected information gain from reaching them. The proposed framework combines the adaptability of reinforcement learning-based exploration policies with the reliability provided by explicit safety mechanisms. This feature plays a key role in enabling the deployment of learning-based policies on robotic platforms operating in real-world environments. Extensive evaluations in both simulations and experiments performed in a lab environment demonstrate that the approach achieves efficient and safe exploration in cluttered spaces.

Platform-Agnostic Reinforcement Learning Framework for Safe Exploration of Cluttered Environments with Graph Attention

TL;DR

Abstract

Platform-Agnostic Reinforcement Learning Framework for Safe Exploration of Cluttered Environments with Graph Attention

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)