Table of Contents
Fetching ...

Semantic-aware Next-Best-View for Multi-DoFs Mobile System in Search-and-Acquisition based Visual Perception

Xiaotong Yu, Chang-Wen Chen

TL;DR

This work tackles efficient semantic-aware visual perception for unknown environments using a multi-DoF mobile system in search-and-acquisition tasks. It introduces a novel information gain that unifies visibility gain and semantic gain into a single utility, and an adaptive strategy with a termination criterion to switch between search and acquisition. The system employs two maps—a occupancy TSDF map and a labelled TSDF map derived from Mask R-CNN depth segmentation—and uses $RRT^*$ for planning within free space. In simulations across three indoor scenes, the approach improves ROI-to-full reconstruction ratio by up to 27.13% and achieves an average perspective directivity of $0.88234$, demonstrating more focused and efficient target perception. The formulation is broadly applicable to semantic-rich perception tasks beyond the tested scenarios.

Abstract

Efficient visual perception using mobile systems is crucial, particularly in unknown environments such as search and rescue operations, where swift and comprehensive perception of objects of interest is essential. In such real-world applications, objects of interest are often situated in complex environments, making the selection of the 'Next Best' view based solely on maximizing visibility gain suboptimal. Semantics, providing a higher-level interpretation of perception, should significantly contribute to the selection of the next viewpoint for various perception tasks. In this study, we formulate a novel information gain that integrates both visibility gain and semantic gain in a unified form to select the semantic-aware Next-Best-View. Additionally, we design an adaptive strategy with termination criterion to support a two-stage search-and-acquisition manoeuvre on multiple objects of interest aided by a multi-degree-of-freedoms (Multi-DoFs) mobile system. Several semantically relevant reconstruction metrics, including perspective directivity and region of interest (ROI)-to-full reconstruction volume ratio, are introduced to evaluate the performance of the proposed approach. Simulation experiments demonstrate the advantages of the proposed approach over existing methods, achieving improvements of up to 27.13% for the ROI-to-full reconstruction volume ratio and a 0.88234 average perspective directivity. Furthermore, the planned motion trajectory exhibits better perceiving coverage toward the target.

Semantic-aware Next-Best-View for Multi-DoFs Mobile System in Search-and-Acquisition based Visual Perception

TL;DR

This work tackles efficient semantic-aware visual perception for unknown environments using a multi-DoF mobile system in search-and-acquisition tasks. It introduces a novel information gain that unifies visibility gain and semantic gain into a single utility, and an adaptive strategy with a termination criterion to switch between search and acquisition. The system employs two maps—a occupancy TSDF map and a labelled TSDF map derived from Mask R-CNN depth segmentation—and uses for planning within free space. In simulations across three indoor scenes, the approach improves ROI-to-full reconstruction ratio by up to 27.13% and achieves an average perspective directivity of , demonstrating more focused and efficient target perception. The formulation is broadly applicable to semantic-rich perception tasks beyond the tested scenarios.

Abstract

Efficient visual perception using mobile systems is crucial, particularly in unknown environments such as search and rescue operations, where swift and comprehensive perception of objects of interest is essential. In such real-world applications, objects of interest are often situated in complex environments, making the selection of the 'Next Best' view based solely on maximizing visibility gain suboptimal. Semantics, providing a higher-level interpretation of perception, should significantly contribute to the selection of the next viewpoint for various perception tasks. In this study, we formulate a novel information gain that integrates both visibility gain and semantic gain in a unified form to select the semantic-aware Next-Best-View. Additionally, we design an adaptive strategy with termination criterion to support a two-stage search-and-acquisition manoeuvre on multiple objects of interest aided by a multi-degree-of-freedoms (Multi-DoFs) mobile system. Several semantically relevant reconstruction metrics, including perspective directivity and region of interest (ROI)-to-full reconstruction volume ratio, are introduced to evaluate the performance of the proposed approach. Simulation experiments demonstrate the advantages of the proposed approach over existing methods, achieving improvements of up to 27.13% for the ROI-to-full reconstruction volume ratio and a 0.88234 average perspective directivity. Furthermore, the planned motion trajectory exhibits better perceiving coverage toward the target.
Paper Structure (22 sections, 14 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 22 sections, 14 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: (a) When the refrigerator is designated as the object of interest, next view candidate 1 provides higher semantic gain while next view candidate 2 offers higher visibility gain; (b) A capture of observing a view with high semantic gain in the experiment, with the standing person is highlighted.
  • Figure 2: Diagram of the system overview: Both the occupancy map and labelled map are constructed in parallel. The Semantic-aware NBV planner takes two maps as the input. The reconstructed mesh is visualized using the occupancy TSDF map.
  • Figure 3: Sub-figures (a), (b) are the normalized ROI reconstruction volume and ROI-to-full reconstruction volume ratio verse the simulation time in the Collapsed Room scene, sub-figure (c) represents the distribution of directivity during the completed experiment in the Collapsed Room scene. Sub-figures (d), (e), and (f) are the corresponding results in the Kitchen and Dining Room experiment. Sub-figures (g), (h), and(i) are the corresponding results in the Kitchen and Dining Room with Multiple Specified Objects. Each sub-figure presents the performance comparison between the proposed approach (S-NBV), RH-NBV bircher2018receding and the frontier-based approach yoder2016autonomous.
  • Figure 4: Original Scenes in Gazebo (the red square denotes the specified target): (a) Collapsed Room; (e) Kitchen and Dining Room; Sub-figures (b) and (f) show the motion trajectories planned by the proposed approach; (c) and (g) are the trajectories planned by RH-NBV bircher2018receding; (d) and (h) show the trajectories planned by the frontier-based approach yoder2016autonomous; The trajectories of different approaches are shown in the same global map, the trajectories of the proposed approach demonstrate the best target perceiving coverage around the target.