Informative Object-centric Next Best View for Object-aware 3D Gaussian Splatting in Cluttered Scenes
Seunghoon Jeong, Eunho Lee, Jeongyun Kim, Ayoung Kim
TL;DR
This work addresses robust view planning in cluttered scenes by introducing an instance-aware NBV policy integrated with object-aware 3D Gaussian Splatting. It injects per-Gaussian object features via a one-hot object vector to compute a confidence-weighted information gain, guiding view selection toward under-observed, task-relevant regions and enabling object-centric reconstruction. The approach yields substantial improvements in depth accuracy and novel-view synthesis on synthetic and real datasets, and extends to real-world robotic manipulation with improved grasping reliability under occlusion. By balancing exploration and exploitation and enabling targeted reconstruction, the method reduces view requirements while enhancing manipulation performance in challenging environments.
Abstract
In cluttered scenes with inevitable occlusions and incomplete observations, selecting informative viewpoints is essential for building a reliable representation. In this context, 3D Gaussian Splatting (3DGS) offers a distinct advantage, as it can explicitly guide the selection of subsequent viewpoints and then refine the representation with new observations. However, existing approaches rely solely on geometric cues, neglect manipulation-relevant semantics, and tend to prioritize exploitation over exploration. To tackle these limitations, we introduce an instance-aware Next Best View (NBV) policy that prioritizes underexplored regions by leveraging object features. Specifically, our object-aware 3DGS distills instancelevel information into one-hot object vectors, which are used to compute confidence-weighted information gain that guides the identification of regions associated with erroneous and uncertain Gaussians. Furthermore, our method can be easily adapted to an object-centric NBV, which focuses view selection on a target object, thereby improving reconstruction robustness to object placement. Experiments demonstrate that our NBV policy reduces depth error by up to 77.14% on the synthetic dataset and 34.10% on the real-world GraspNet dataset compared to baselines. Moreover, compared to targeting the entire scene, performing NBV on a specific object yields an additional reduction of 25.60% in depth error for that object. We further validate the effectiveness of our approach through real-world robotic manipulation tasks.
