Table of Contents
Fetching ...

GO-N3RDet: Geometry Optimized NeRF-enhanced 3D Object Detector

Zechuan Li, Hongshan Yu, Yihao Ding, Jinhao Qiao, Basim Azam, Naveed Akhtar

TL;DR

GO-N3RDet tackles the challenge of accurate 3D object detection from multi-view images by integrating a geometry-optimized voxel representation with a NeRF-based scene model. It introduces PEOM to embed 3D positional information into voxel features, DIS to focus NeRF sampling on foreground regions, and OOM to enforce cross-view opacity consistency while weighting by ray distance to reduce cumulative errors. The end-to-end framework achieves state-of-the-art results on ScanNet and ARKITScenes, with notable gains over prior NeRF-based detectors and efficient training compared to alternative geometry-driven methods. This work advances practical indoor 3D perception by delivering more reliable geometry and opacity estimates essential for precise 3D detection.

Abstract

We propose GO-N3RDet, a scene-geometry optimized multi-view 3D object detector enhanced by neural radiance fields. The key to accurate 3D object detection is in effective voxel representation. However, due to occlusion and lack of 3D information, constructing 3D features from multi-view 2D images is challenging. Addressing that, we introduce a unique 3D positional information embedded voxel optimization mechanism to fuse multi-view features. To prioritize neural field reconstruction in object regions, we also devise a double importance sampling scheme for the NeRF branch of our detector. We additionally propose an opacity optimization module for precise voxel opacity prediction by enforcing multi-view consistency constraints. Moreover, to further improve voxel density consistency across multiple perspectives, we incorporate ray distance as a weighting factor to minimize cumulative ray errors. Our unique modules synergetically form an end-to-end neural model that establishes new state-of-the-art in NeRF-based multi-view 3D detection, verified with extensive experiments on ScanNet and ARKITScenes. Code will be available at https://github.com/ZechuanLi/GO-N3RDet.

GO-N3RDet: Geometry Optimized NeRF-enhanced 3D Object Detector

TL;DR

GO-N3RDet tackles the challenge of accurate 3D object detection from multi-view images by integrating a geometry-optimized voxel representation with a NeRF-based scene model. It introduces PEOM to embed 3D positional information into voxel features, DIS to focus NeRF sampling on foreground regions, and OOM to enforce cross-view opacity consistency while weighting by ray distance to reduce cumulative errors. The end-to-end framework achieves state-of-the-art results on ScanNet and ARKITScenes, with notable gains over prior NeRF-based detectors and efficient training compared to alternative geometry-driven methods. This work advances practical indoor 3D perception by delivering more reliable geometry and opacity estimates essential for precise 3D detection.

Abstract

We propose GO-N3RDet, a scene-geometry optimized multi-view 3D object detector enhanced by neural radiance fields. The key to accurate 3D object detection is in effective voxel representation. However, due to occlusion and lack of 3D information, constructing 3D features from multi-view 2D images is challenging. Addressing that, we introduce a unique 3D positional information embedded voxel optimization mechanism to fuse multi-view features. To prioritize neural field reconstruction in object regions, we also devise a double importance sampling scheme for the NeRF branch of our detector. We additionally propose an opacity optimization module for precise voxel opacity prediction by enforcing multi-view consistency constraints. Moreover, to further improve voxel density consistency across multiple perspectives, we incorporate ray distance as a weighting factor to minimize cumulative ray errors. Our unique modules synergetically form an end-to-end neural model that establishes new state-of-the-art in NeRF-based multi-view 3D detection, verified with extensive experiments on ScanNet and ARKITScenes. Code will be available at https://github.com/ZechuanLi/GO-N3RDet.

Paper Structure

This paper contains 11 sections, 23 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: The paradigm of exploiting Neural Radiance Fields (NeRF) for multi-view 3D object detection suffers from critical issues of lack of 3D positional information modeling and insufficiency of scene geometry perception, leading to under par performance by methods like NeRF-Det r12 (left-most). We show that resolving these issues exclusively improves 3D object detection (dotted red boxes).
  • Figure 2: Schematics of GO-N3RDet: Given $N$ input multi-view images $\{I_i\}_{i=1}^N$, we first project them onto a regular 3D grid, assigning each voxel $N$ corresponding image features. The proposed Positional information Embedded voxel Optimization Module (POEM) fuses these features and optimizes the voxel positions. In the NeRF branch, Double Important Sampling (DIS) is employed to focus on more foreground points, and opacity is optimized using the Opacity Optimization Module (OOM). The optimized opacity is used to adjust the fused voxel features, which are finally fed into the detection head.
  • Figure 3: Illustration of PEOM. The voxel center is projected onto multi-view (MV) images, where each pixel coordinate predicts an offset. Features are then fused through max pooling, and the voxel position is determined based on the pixel coordinate corresponding to the maximum value. B_Proj. denotes back-projection.
  • Figure 4: Illustration of Double Important Sampling effect. First, uniform sampling is performed, followed by CDF estimation based on the densities of the sampled points. Then, the foreground points with higher density are sampled.
  • Figure 5: Illustration of loss for Opacity Optimization Module.
  • ...and 2 more figures