CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images
Guanlin Shen, Jingwei Huang, Zhihua Hu, Bin Wang
TL;DR
CN-RMA tackles indoor 3D object detection from multi-view images by unifying a neural implicit scene reconstruction with an occlusion-aware feature aggregation. The core idea, Ray Marching Aggregation (RMA), weights image feature votes along rays using a rough TSDF-derived volume density and transmittance, mitigating occlusion-induced misprojections. The method integrates an Atlas-inspired MVS module with a FCAF3D detector in an end-to-end trainable pipeline, achieved through a three-stage training scheme. Empirically, CN-RMA delivers state-of-the-art mAP@0.25 and mAP@0.5 on ScanNet and ARKitScenes, substantially outperforming prior single-stage and two-stage approaches. The work demonstrates the value of geometry-informed, occlusion-aware aggregation for robust indoor 3D perception and paves the way for broader use of implicit representations in 3D detection tasks.
Abstract
This paper introduces CN-RMA, a novel approach for 3D indoor object detection from multi-view images. We observe the key challenge as the ambiguity of image and 3D correspondence without explicit geometry to provide occlusion information. To address this issue, CN-RMA leverages the synergy of 3D reconstruction networks and 3D object detection networks, where the reconstruction network provides a rough Truncated Signed Distance Function (TSDF) and guides image features to vote to 3D space correctly in an end-to-end manner. Specifically, we associate weights to sampled points of each ray through ray marching, representing the contribution of a pixel in an image to corresponding 3D locations. Such weights are determined by the predicted signed distances so that image features vote only to regions near the reconstructed surface. Our method achieves state-of-the-art performance in 3D object detection from multi-view images, as measured by mAP@0.25 and mAP@0.5 on the ScanNet and ARKitScenes datasets. The code and models are released at https://github.com/SerCharles/CN-RMA.
