Table of Contents
Fetching ...

Informative Object-centric Next Best View for Object-aware 3D Gaussian Splatting in Cluttered Scenes

Seunghoon Jeong, Eunho Lee, Jeongyun Kim, Ayoung Kim

TL;DR

This work addresses robust view planning in cluttered scenes by introducing an instance-aware NBV policy integrated with object-aware 3D Gaussian Splatting. It injects per-Gaussian object features via a one-hot object vector to compute a confidence-weighted information gain, guiding view selection toward under-observed, task-relevant regions and enabling object-centric reconstruction. The approach yields substantial improvements in depth accuracy and novel-view synthesis on synthetic and real datasets, and extends to real-world robotic manipulation with improved grasping reliability under occlusion. By balancing exploration and exploitation and enabling targeted reconstruction, the method reduces view requirements while enhancing manipulation performance in challenging environments.

Abstract

In cluttered scenes with inevitable occlusions and incomplete observations, selecting informative viewpoints is essential for building a reliable representation. In this context, 3D Gaussian Splatting (3DGS) offers a distinct advantage, as it can explicitly guide the selection of subsequent viewpoints and then refine the representation with new observations. However, existing approaches rely solely on geometric cues, neglect manipulation-relevant semantics, and tend to prioritize exploitation over exploration. To tackle these limitations, we introduce an instance-aware Next Best View (NBV) policy that prioritizes underexplored regions by leveraging object features. Specifically, our object-aware 3DGS distills instancelevel information into one-hot object vectors, which are used to compute confidence-weighted information gain that guides the identification of regions associated with erroneous and uncertain Gaussians. Furthermore, our method can be easily adapted to an object-centric NBV, which focuses view selection on a target object, thereby improving reconstruction robustness to object placement. Experiments demonstrate that our NBV policy reduces depth error by up to 77.14% on the synthetic dataset and 34.10% on the real-world GraspNet dataset compared to baselines. Moreover, compared to targeting the entire scene, performing NBV on a specific object yields an additional reduction of 25.60% in depth error for that object. We further validate the effectiveness of our approach through real-world robotic manipulation tasks.

Informative Object-centric Next Best View for Object-aware 3D Gaussian Splatting in Cluttered Scenes

TL;DR

This work addresses robust view planning in cluttered scenes by introducing an instance-aware NBV policy integrated with object-aware 3D Gaussian Splatting. It injects per-Gaussian object features via a one-hot object vector to compute a confidence-weighted information gain, guiding view selection toward under-observed, task-relevant regions and enabling object-centric reconstruction. The approach yields substantial improvements in depth accuracy and novel-view synthesis on synthetic and real datasets, and extends to real-world robotic manipulation with improved grasping reliability under occlusion. By balancing exploration and exploitation and enabling targeted reconstruction, the method reduces view requirements while enhancing manipulation performance in challenging environments.

Abstract

In cluttered scenes with inevitable occlusions and incomplete observations, selecting informative viewpoints is essential for building a reliable representation. In this context, 3D Gaussian Splatting (3DGS) offers a distinct advantage, as it can explicitly guide the selection of subsequent viewpoints and then refine the representation with new observations. However, existing approaches rely solely on geometric cues, neglect manipulation-relevant semantics, and tend to prioritize exploitation over exploration. To tackle these limitations, we introduce an instance-aware Next Best View (NBV) policy that prioritizes underexplored regions by leveraging object features. Specifically, our object-aware 3DGS distills instancelevel information into one-hot object vectors, which are used to compute confidence-weighted information gain that guides the identification of regions associated with erroneous and uncertain Gaussians. Furthermore, our method can be easily adapted to an object-centric NBV, which focuses view selection on a target object, thereby improving reconstruction robustness to object placement. Experiments demonstrate that our NBV policy reduces depth error by up to 77.14% on the synthetic dataset and 34.10% on the real-world GraspNet dataset compared to baselines. Moreover, compared to targeting the entire scene, performing NBV on a specific object yields an additional reduction of 25.60% in depth error for that object. We further validate the effectiveness of our approach through real-world robotic manipulation tasks.
Paper Structure (23 sections, 12 equations, 8 figures, 4 tables)

This paper contains 23 sections, 12 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Object-centric NBV for manipulation in cluttered scenes. Instead of reconstructing the entire scene, we prioritize task-relevant regions surrounding the target object (tennis ball (red) or cup (blue)). Using these selected views with our object-aware 3DGS yields improved novel view synthesis and depth reconstruction, as shown below.
  • Figure 2: Visualization of the rendering and learning process of the one-hot object vector. Each Gaussian stores logits for $n+1$ classes, including $n$ object categories and the background. The logits are passed through a softmax function and alpha-blended in the same manner as RGB in 3DGS. After normalization, this yields the per-pixel class probability along each ray. By supervising these probabilities with one-hot vectors obtained from instance masks, the logits for each Gaussian can be optimized.
  • Figure 3: Illustration of the overall NBV system. Given an object-segmented 3D Gaussian map $G$ and candidate camera pose, the CUDA rasterizer computes the Jacobian of the rendering with respect to Gaussian parameters. Correlations among parameters of each Gaussian are captured in a block diagonal Hessian, which is then used to estimate information gain. By comparing gains across candidates, the most informative view is selected, without requiring GT images of candidate views.
  • Figure 4: Illustration of object confidence scores for exploration. The top row shows the 3D representation of a captured scene trained with 6 views, visualized as Gaussian opacity, object probability, and object index. Rectangles of the same color highlight different cases: purple indicates well-optimized regions that require no further observation, while pink and blue denote areas either unseen or poorly localized, thus requiring exploration. We leverage the intuition that regions needing additional views exhibit low opacity and low object probability, and incorporate this insight into our NBV formulation.
  • Figure 5: Qualitative results of NBV system and 3D reconstruction for whole scene. Looking at the dish or the snack box, we can see that in the baseline, since training was performed only with RGB without instance information, textures and text appear in the depth results. In contrast, with our method such artifacts are absent, and the views selected through our NBV capture scene information from all directions, enabling more accurate overall depth reconstruction.
  • ...and 3 more figures