Table of Contents
Fetching ...

Semantically-driven Deep Reinforcement Learning for Inspection Path Planning

Grzegorz Malczyk, Mihir Kulkarni, Kostas Alexis

TL;DR

This work tackles semantics-aware inspection path planning in unknown environments by learning an end-to-end deep RL policy that jointly inspects target objects and navigates collision-free paths using only onboard local sensors. The method introduces ego-centric occupancy and spatio-visit-entropy maps to provide local spatial and visitation context, while semantically masked depth inputs enable robust, object-focused perception. An APPO-trained, memory-augmented network encodes 3D occupancy, svs maps, and depth features to output velocity/yaw commands that maximize surface coverage while avoiding obstacles; the reward combines face-mesh coverage, semantic exploration, and collision penalties. Through extensive simulations and real-world aerial experiments, the approach demonstrates strong generalization to unseen object geometries and layouts, bridging the sim2real gap and achieving efficient, semantic-driven inspection without prior maps or long-term SLAM, with the method released as open-source.

Abstract

This paper introduces a novel semantics-aware inspection planning policy derived through deep reinforcement learning. Reflecting the fact that within autonomous informative path planning missions in unknown environments, it is often only a sparse set of objects of interest that need to be inspected, the method contributes an end-to-end policy that simultaneously performs semantic object visual inspection combined with collision-free navigation. Assuming access only to the instantaneous depth map, the associated segmentation image, the ego-centric local occupancy, and the history of past positions in the robot's neighborhood, the method demonstrates robust generalizability and successful crossing of the sim2real gap. Beyond simulations and extensive comparison studies, the approach is verified in experimental evaluations onboard a flying robot deployed in novel environments with previously unseen semantics and overall geometric configurations.

Semantically-driven Deep Reinforcement Learning for Inspection Path Planning

TL;DR

This work tackles semantics-aware inspection path planning in unknown environments by learning an end-to-end deep RL policy that jointly inspects target objects and navigates collision-free paths using only onboard local sensors. The method introduces ego-centric occupancy and spatio-visit-entropy maps to provide local spatial and visitation context, while semantically masked depth inputs enable robust, object-focused perception. An APPO-trained, memory-augmented network encodes 3D occupancy, svs maps, and depth features to output velocity/yaw commands that maximize surface coverage while avoiding obstacles; the reward combines face-mesh coverage, semantic exploration, and collision penalties. Through extensive simulations and real-world aerial experiments, the approach demonstrates strong generalization to unseen object geometries and layouts, bridging the sim2real gap and achieving efficient, semantic-driven inspection without prior maps or long-term SLAM, with the method released as open-source.

Abstract

This paper introduces a novel semantics-aware inspection planning policy derived through deep reinforcement learning. Reflecting the fact that within autonomous informative path planning missions in unknown environments, it is often only a sparse set of objects of interest that need to be inspected, the method contributes an end-to-end policy that simultaneously performs semantic object visual inspection combined with collision-free navigation. Assuming access only to the instantaneous depth map, the associated segmentation image, the ego-centric local occupancy, and the history of past positions in the robot's neighborhood, the method demonstrates robust generalizability and successful crossing of the sim2real gap. Beyond simulations and extensive comparison studies, the approach is verified in experimental evaluations onboard a flying robot deployed in novel environments with previously unseen semantics and overall geometric configurations.

Paper Structure

This paper contains 19 sections, 10 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: A flying robot conducting an inspection in an industrial environment.
  • Figure 2: The proposed deep rl network for semantics-driven inspection. For a robot navigating an environment a target semantic object and obstacles, the local region from which occupancy and spatial visit score maps are derived is outlined with a dashed line. The two local maps are processed by the 3D encoder, while a semantically masked depth image is fed into the 2D encoder. The resulting latent representations, combined with the agent's state, are passed through an MLP and then a GRU block.
  • Figure 3: Left: Inspection mission in the Aerial Gym Simulator. The environment consists of obstacles and a semantic of interest presented as a blue cuboid in this scenario. Right top-down: Images utilized during the training.
  • Figure 4: Left:wbt with three concave-shaped bracket toes. Right: Chemical plant with three objects: pipe, tank and exhaust. For each environment once the inspection time of the semantic is passed, the rl framework switches the semantic label to the next object of interest.
  • Figure 5: The percentage of the cumulative surface of each semantic observed by the camera sensor over time, considering visibility under distance and resolution constraints, by all methods. The transparent region represents the 5th–95th percentile range of surface coverage variability across runs, while the solid line indicates the mean coverage. With two horizontal lines we depict the $95\%$ and $100\%$ of the ideal feasible coverage.
  • ...and 3 more figures