Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection

Feng Liu; Tengteng Huang; Qianjing Zhang; Haotian Yao; Chi Zhang; Fang Wan; Qixiang Ye; Yanzhao Zhou

Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection

Feng Liu, Tengteng Huang, Qianjing Zhang, Haotian Yao, Chi Zhang, Fang Wan, Qixiang Ye, Yanzhao Zhou

TL;DR

Ray Denoising is an innovative method that enhances detection accuracy by strategically sampling along camera rays to construct hard negative examples, compel the model to learn depth-aware features, thereby improving its capacity to distinguish between true and false positives.

Abstract

Multi-view 3D object detection systems often struggle with generating precise predictions due to the challenges in estimating depth from images, increasing redundant and incorrect detections. Our paper presents Ray Denoising, an innovative method that enhances detection accuracy by strategically sampling along camera rays to construct hard negative examples. These examples, visually challenging to differentiate from true positives, compel the model to learn depth-aware features, thereby improving its capacity to distinguish between true and false positives. Ray Denoising is designed as a plug-and-play module, compatible with any DETR-style multi-view 3D detectors, and it only minimally increases training computational costs without affecting inference speed. Our comprehensive experiments, including detailed ablation studies, consistently demonstrate that Ray Denoising outperforms strong baselines across multiple datasets. It achieves a 1.9\% improvement in mean Average Precision (mAP) over the state-of-the-art StreamPETR method on the NuScenes dataset. It shows significant performance gains on the Argoverse 2 dataset, highlighting its generalization capability. The code will be available at https://github.com/LiewFeng/RayDN.

Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection

TL;DR

Abstract

Paper Structure (17 sections, 5 equations, 6 figures, 10 tables)

This paper contains 17 sections, 5 equations, 6 figures, 10 tables.

Introduction
Related Work
Image-based 3D Object Detection
Hard Negative Samples Mining
Denoising in Object Detection
Methodology
Overview
Ray Casting
Sample Generation
Query Denoising
Discussion
Experiment
Dataset and Metrics
Implementation Details
Comparison with State-of-the-Art Methods
...and 2 more sections

Figures (6)

Figure 1: The challenge of estimating depth from images in multi-view 3D object detection leads to duplicate predictions and false positive detections along camera rays. Best viewed in color.
Figure 2: The proposed Ray Denoising approach (right) effectively reduces false positive detections along the ray (highlighted by red rectangles) in the previous state-of-the-art method StreamPETR Wang_2023_ICCV (left). Best viewed by zooming on the screen.
Figure 3: Overall framework of the Ray Denoising approach, a plug-and-play training technique for DETR-style multi-view 3D object detectors, focuses on refining the model's ability to distinguish true positives from false positives in depth. Casting rays and sampling depth-aware denoising queries effectively tackle the challenge of false positives arising from the inherent difficulties in visually estimating depth, leading to substantial improvements in detection performance over strong baselines. Best viewed in color and by zooming on the screen.
Figure 4: (a) Distribution comparison showing that the Beta distribution is bounded between -1 and 1, unlike the Laplace and Gaussian distributions, which are unbounded. (b) The Beta distribution family, with the x-range adjusted from $[0,1]$ to $[-1,1]$ using the transformation $y=2x-1$. Best viewed in color.
Figure 5: (a) Visualization of the precision-recall curves at various distance thresholds. Ray Denoising consistently enhances precision across nearly all recall levels, effectively suppressing false positives. (b) Class-wise AP comparison. Ray Denoising performs superior over the SOTA StreamPETR in all object classes. Best viewed in color.
...and 1 more figures

Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection

TL;DR

Abstract

Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (6)