Table of Contents
Fetching ...

3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection

Yang Cao, Yuanliang Jv, Dan Xu

TL;DR

The paper tackles limitations of NeRF-based 3D object detection by introducing 3D Gaussian Splatting (3DGS) into 3DOD. It proposes Boundary Guidance to yield clearer 3D blob distributions and Box-Focused Sampling to preserve object blobs while pruning background noise, all without extra learnable parameters. Through extensive ablations and cross-dataset experiments (ScanNet and ARKitScenes), 3DGS-DET achieves +6.6/+8.1 mAP improvements over NeRF-Det on ScanNet and +31.5 mAP on ARKitScenes, highlighting the effectiveness of explicit 3D representations and 2D priors for detection. The work establishes a new, efficient paradigm for view-synthesis-based 3DOD with strong practical impact and clear avenues for future joint training strategies.

Abstract

Neural Radiance Fields (NeRF) are widely used for novel-view synthesis and have been adapted for 3D Object Detection (3DOD), offering a promising approach to 3DOD through view-synthesis representation. However, NeRF faces inherent limitations: (i) limited representational capacity for 3DOD due to its implicit nature, and (ii) slow rendering speeds. Recently, 3D Gaussian Splatting (3DGS) has emerged as an explicit 3D representation that addresses these limitations. Inspired by these advantages, this paper introduces 3DGS into 3DOD for the first time, identifying two main challenges: (i) Ambiguous spatial distribution of Gaussian blobs: 3DGS primarily relies on 2D pixel-level supervision, resulting in unclear 3D spatial distribution of Gaussian blobs and poor differentiation between objects and background, which hinders 3DOD; (ii) Excessive background blobs: 2D images often include numerous background pixels, leading to densely reconstructed 3DGS with many noisy Gaussian blobs representing the background, negatively affecting detection. To tackle the challenge (i), we leverage the fact that 3DGS reconstruction is derived from 2D images, and propose an elegant and efficient solution by incorporating 2D Boundary Guidance to significantly enhance the spatial distribution of Gaussian blobs, resulting in clearer differentiation between objects and their background. To address the challenge (ii), we propose a Box-Focused Sampling strategy using 2D boxes to generate object probability distribution in 3D spaces, allowing effective probabilistic sampling in 3D to retain more object blobs and reduce noisy background blobs. Benefiting from our designs, our 3DGS-DET significantly outperforms the SOTA NeRF-based method, NeRF-Det, achieving improvements of +6.6 on mAP@0.25 and +8.1 on mAP@0.5 for the ScanNet dataset, and impressive +31.5 on mAP@0.25 for the ARKITScenes dataset.

3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection

TL;DR

The paper tackles limitations of NeRF-based 3D object detection by introducing 3D Gaussian Splatting (3DGS) into 3DOD. It proposes Boundary Guidance to yield clearer 3D blob distributions and Box-Focused Sampling to preserve object blobs while pruning background noise, all without extra learnable parameters. Through extensive ablations and cross-dataset experiments (ScanNet and ARKitScenes), 3DGS-DET achieves +6.6/+8.1 mAP improvements over NeRF-Det on ScanNet and +31.5 mAP on ARKitScenes, highlighting the effectiveness of explicit 3D representations and 2D priors for detection. The work establishes a new, efficient paradigm for view-synthesis-based 3DOD with strong practical impact and clear avenues for future joint training strategies.

Abstract

Neural Radiance Fields (NeRF) are widely used for novel-view synthesis and have been adapted for 3D Object Detection (3DOD), offering a promising approach to 3DOD through view-synthesis representation. However, NeRF faces inherent limitations: (i) limited representational capacity for 3DOD due to its implicit nature, and (ii) slow rendering speeds. Recently, 3D Gaussian Splatting (3DGS) has emerged as an explicit 3D representation that addresses these limitations. Inspired by these advantages, this paper introduces 3DGS into 3DOD for the first time, identifying two main challenges: (i) Ambiguous spatial distribution of Gaussian blobs: 3DGS primarily relies on 2D pixel-level supervision, resulting in unclear 3D spatial distribution of Gaussian blobs and poor differentiation between objects and background, which hinders 3DOD; (ii) Excessive background blobs: 2D images often include numerous background pixels, leading to densely reconstructed 3DGS with many noisy Gaussian blobs representing the background, negatively affecting detection. To tackle the challenge (i), we leverage the fact that 3DGS reconstruction is derived from 2D images, and propose an elegant and efficient solution by incorporating 2D Boundary Guidance to significantly enhance the spatial distribution of Gaussian blobs, resulting in clearer differentiation between objects and their background. To address the challenge (ii), we propose a Box-Focused Sampling strategy using 2D boxes to generate object probability distribution in 3D spaces, allowing effective probabilistic sampling in 3D to retain more object blobs and reduce noisy background blobs. Benefiting from our designs, our 3DGS-DET significantly outperforms the SOTA NeRF-based method, NeRF-Det, achieving improvements of +6.6 on mAP@0.25 and +8.1 on mAP@0.5 for the ScanNet dataset, and impressive +31.5 on mAP@0.25 for the ARKITScenes dataset.
Paper Structure (18 sections, 15 equations, 11 figures, 6 tables)

This paper contains 18 sections, 15 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Illustration of the proposed Boundary Guidance. By incorporating Boundary Guidance in the training of 3D Gaussian Splatting (3DGS), we significantly improve the spatial distribution of Gaussian blobs relating objects and the background. To better show this improved spatial distribution, we visualize only the positions of the Gaussian blobs, omitting other attributes for clarity.
  • Figure 2: Pipeline overview (zooming in for a clearer view). The top row illustrates our basic pipeline detailed in Sec. \ref{['sec:methods:basic pipeline']}. The bottom row shows our 3DGS-DET pipeline with both Boundary Guidance (Sec. \ref{['sec:methods:boundary-guidance']}) and Box-Focused Sampling (Sec. \ref{['sec:methods:box-focused-sampling']}) embedded. The Boundary Guidance can significantly improve the 3D spatial distribution of Gaussian blobs, and thus produce clearer differentiation between objects and the background. The Box-Focused Sampling effectively preserves more object-related blobs while suppressing noisy background blobs, compared to random sampling. These two proposed strategies together largely advance the 3D detection performance.
  • Figure 3: Illustration of the proposed Boundary Guidance and Box-Focused Sampling strategies. In the top row, Boundary Guidance is constructed by three steps, i.e., detecting boundaries on posed images, overlaying them to images, and training a 3DGS model to achieve a more distinct spatial distribution of Gaussian blobs for objects and the background. In the bottom row, Box-Focused Sampling is achieved by conducting object detection on posed images. The predicted 2D boxes are projected into the 3D domain to establish object probability spaces, allowing probabilistic sampling of Gaussians to preserve more object blobs and suppress noisy background blobs.
  • Figure 4: Qualitative comparison. Our methods identify more 3D objects in the scene with better positional precision, highlighting the advantages of our approach over NeRF-Det xu2023nerf. In this figure, the scene is represented using mesh to clearly show the boxes.
  • Figure 5: Analysis of guidance from different priors: (a) Center Point Guidance, (b) Mask Guidance, and (c) Boundary Guidance. In (a) and (b), the spatial distribution of Gaussian blobs for objects like the chair, trash bin and sink is incomplete and ambiguous. Gaussian blobs trained with Boundary Guidance exhibit a clearer spatial distribution. The reason behind this phenomenon is that the center point provides only positional guidance, lacking richer information like shape or size. The mask highlights shape and size but hides the object's surface, reducing texture and geometric information. Boundary Guidance offers positional cues and richer information, such as shape and size, while preserving texture and geometric details on the object's surface, leading to the best performance.
  • ...and 6 more figures