Semantically-driven Deep Reinforcement Learning for Inspection Path Planning
Grzegorz Malczyk, Mihir Kulkarni, Kostas Alexis
TL;DR
This work tackles semantics-aware inspection path planning in unknown environments by learning an end-to-end deep RL policy that jointly inspects target objects and navigates collision-free paths using only onboard local sensors. The method introduces ego-centric occupancy and spatio-visit-entropy maps to provide local spatial and visitation context, while semantically masked depth inputs enable robust, object-focused perception. An APPO-trained, memory-augmented network encodes 3D occupancy, svs maps, and depth features to output velocity/yaw commands that maximize surface coverage while avoiding obstacles; the reward combines face-mesh coverage, semantic exploration, and collision penalties. Through extensive simulations and real-world aerial experiments, the approach demonstrates strong generalization to unseen object geometries and layouts, bridging the sim2real gap and achieving efficient, semantic-driven inspection without prior maps or long-term SLAM, with the method released as open-source.
Abstract
This paper introduces a novel semantics-aware inspection planning policy derived through deep reinforcement learning. Reflecting the fact that within autonomous informative path planning missions in unknown environments, it is often only a sparse set of objects of interest that need to be inspected, the method contributes an end-to-end policy that simultaneously performs semantic object visual inspection combined with collision-free navigation. Assuming access only to the instantaneous depth map, the associated segmentation image, the ego-centric local occupancy, and the history of past positions in the robot's neighborhood, the method demonstrates robust generalizability and successful crossing of the sim2real gap. Beyond simulations and extensive comparison studies, the approach is verified in experimental evaluations onboard a flying robot deployed in novel environments with previously unseen semantics and overall geometric configurations.
