OA-NBV: Occlusion-Aware Next-Best-View Planning for Human-Centered Active Perception on Mobile Robots

Boxun Hu; Chang Chang; Jiawei Ge; Man Namgung; Xiaomin Lin; Axel Krieger; Tinoosh Mohsenin

OA-NBV: Occlusion-Aware Next-Best-View Planning for Human-Centered Active Perception on Mobile Robots

Boxun Hu, Chang Chang, Jiawei Ge, Man Namgung, Xiaomin Lin, Axel Krieger, Tinoosh Mohsenin

TL;DR

Occlusion-Aware Next-Best-View Planning for Human-Centered Active Perception on Mobile Robots (OA-NBV), an occlusion-aware NBV pipeline that autonomously selects the next traversable viewpoint to obtain a more complete view of an occluded human.

Abstract

We naturally step sideways or lean to see around the obstacle when our view is blocked, and recover a more informative observation. Enabling robots to make the same kind of viewpoint choice is critical for human-centered operations, including search, triage, and disaster response, where cluttered environments and partial visibility frequently degrade downstream perception. However, many Next-Best-View (NBV) methods primarily optimize generic exploration or long-horizon coverage, and do not explicitly target the immediate goal of obtaining a single usable observation of a partially occluded person under real motion constraints. We present Occlusion-Aware Next-Best-View Planning for Human-Centered Active Perception on Mobile Robots (OA-NBV), an occlusion-aware NBV pipeline that autonomously selects the next traversable viewpoint to obtain a more complete view of an occluded human. OA-NBV integrates perception and motion planning by scoring candidate viewpoints using a target-centric visibility model that accounts for occlusion, target scale, and target completeness, while restricting candidates to feasible robot poses. OA-NBV achieves over 90% success rate in both simulation and real-world trials, while baseline NBV methods degrade sharply under occlusion. Beyond success rate, OA-NBV improves observation quality: compared to the strongest baseline, it increases normalized target area by at least 81% and keypoint visibility by at least 58% across settings, making it a drop-in view-selection module for diverse human-centered downstream tasks.

OA-NBV: Occlusion-Aware Next-Best-View Planning for Human-Centered Active Perception on Mobile Robots

TL;DR

Abstract

Paper Structure (14 sections, 3 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 14 sections, 3 equations, 9 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Methods
3D Information Extraction
Occlusion-aware NBV Generation
Experiments and Results
Experiment Setup
Weight Tuning
Simulation Results
Real World Results
Discussion
Ablation study
Limitations and future work.
Conclusion

Figures (9)

Figure 1: Occlusion-Aware Next-Best-View (OA-NBV) Planning for Human-Centered Active Perception on Mobile Robots. Given an RGB image of a mannequin in an outdoor scene and a point cloud observed from the current viewpoint (pink), our_pipeline identifies the next-best viewpoint (green) that maximizes target visibility under terrain traversability constraints.
Figure 2: Overview of . Given an RGB image and its paired point cloud from the initial viewpoint, proceeds in two stages. 3D Information Extraction: (a) a human mesh is reconstructed from the RGB input, (b) body part meshes are obtained according to predicted labels, (c) a 2D segmentation mask is generated using SAM 2, and (d) projected onto the point cloud to isolate the 3D target region, (e) which is then registered with the detected body parts and mapped transformation back onto the complete mesh. Occlusion-aware NBV Generation: (f) an elevation map is built from LiDAR measurements to model terrain geometry, (g) traversable pose candidates are sampled on the elevation map, and (h) the next-best viewpoint is selected from the candidates to maximize target visibility while respecting terrain constraints. The robot then navigates to the best viewpoint, acquiring an improved observation of the target.
Figure 3: An illustration of our modified SAT-HMR architecture with an additional human-part classification. The input RGB image is encoded by a shared scale-adaptive tokens (SAT) encoder. Two parallel decoders operate on the shared features: a human-mesh decoder that predicts the SMPL mesh and bounding box, and a human-parts decoder with classification heads that assign part labels to human regions. The predicted part cues are used to construct part-specific meshes and enable part-aware mesh-to-point-cloud alignment in Algorithm 1.
Figure 4: Overview of simulation and real-world environments. From top to bottom, the rows correspond to indoor simulation, outdoor simulation, indoor real-world, and outdoor real-world. From left to right, the columns show the initial overall view, the initial camera view, the best overall view, and the best camera view. The red bounding box indicates the human’s location. In simulation environments, the yellow box denotes the camera position.
Figure 5: SNR heatmap of evaluator weight sweep testing. The x-axis is the occlusion weight $w_o$ and the y-axis is the target-area weight $w_a$, with the remaining visibility weight $w_v = 1 - w_o - w_a$. The star marks the best-performing weight setting.
...and 4 more figures

OA-NBV: Occlusion-Aware Next-Best-View Planning for Human-Centered Active Perception on Mobile Robots

TL;DR

Abstract

OA-NBV: Occlusion-Aware Next-Best-View Planning for Human-Centered Active Perception on Mobile Robots

Authors

TL;DR

Abstract

Table of Contents

Figures (9)