Table of Contents
Fetching ...

SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection

Gang Zhang, Junnan Chen, Guohuan Gao, Jianmin Li, Si Liu, Xiaolin Hu

TL;DR

This work proposes a fully sparse adaptive feature diffusion network (SAFDNet) for LiDAR-based 3D object detection, designed to address the center feature missing problem, and conducts extensive experiments on Waymo Open, nuScenes, and Argoverse2 datasets.

Abstract

LiDAR-based 3D object detection plays an essential role in autonomous driving. Existing high-performing 3D object detectors usually build dense feature maps in the backbone network and prediction head. However, the computational costs introduced by the dense feature maps grow quadratically as the perception range increases, making these models hard to scale up to long-range detection. Some recent works have attempted to construct fully sparse detectors to solve this issue; nevertheless, the resulting models either rely on a complex multi-stage pipeline or exhibit inferior performance. In this work, we propose SAFDNet, a straightforward yet highly effective architecture, tailored for fully sparse 3D object detection. In SAFDNet, an adaptive feature diffusion strategy is designed to address the center feature missing problem. We conducted extensive experiments on Waymo Open, nuScenes, and Argoverse2 datasets. SAFDNet performed slightly better than the previous SOTA on the first two datasets but much better on the last dataset, which features long-range detection, verifying the efficacy of SAFDNet in scenarios where long-range detection is required. Notably, on Argoverse2, SAFDNet surpassed the previous best hybrid detector HEDNet by 2.6% mAP while being 2.1x faster, and yielded 2.1% mAP gains over the previous best sparse detector FSDv2 while being 1.3x faster. The code will be available at https://github.com/zhanggang001/HEDNet.

SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection

TL;DR

This work proposes a fully sparse adaptive feature diffusion network (SAFDNet) for LiDAR-based 3D object detection, designed to address the center feature missing problem, and conducts extensive experiments on Waymo Open, nuScenes, and Argoverse2 datasets.

Abstract

LiDAR-based 3D object detection plays an essential role in autonomous driving. Existing high-performing 3D object detectors usually build dense feature maps in the backbone network and prediction head. However, the computational costs introduced by the dense feature maps grow quadratically as the perception range increases, making these models hard to scale up to long-range detection. Some recent works have attempted to construct fully sparse detectors to solve this issue; nevertheless, the resulting models either rely on a complex multi-stage pipeline or exhibit inferior performance. In this work, we propose SAFDNet, a straightforward yet highly effective architecture, tailored for fully sparse 3D object detection. In SAFDNet, an adaptive feature diffusion strategy is designed to address the center feature missing problem. We conducted extensive experiments on Waymo Open, nuScenes, and Argoverse2 datasets. SAFDNet performed slightly better than the previous SOTA on the first two datasets but much better on the last dataset, which features long-range detection, verifying the efficacy of SAFDNet in scenarios where long-range detection is required. Notably, on Argoverse2, SAFDNet surpassed the previous best hybrid detector HEDNet by 2.6% mAP while being 2.1x faster, and yielded 2.1% mAP gains over the previous best sparse detector FSDv2 while being 1.3x faster. The code will be available at https://github.com/zhanggang001/HEDNet.
Paper Structure (17 sections, 2 equations, 5 figures, 10 tables)

This paper contains 17 sections, 2 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: Comparison among previous one-stage hybrid detectors, the fully sparse detector FSDv1, and our SAFDNet.
  • Figure 2: Overall framework of SAFDNet. Taking the raw point clouds as input, SAFDNet extracts initial 3D sparse feature maps by the voxel feature encoder (VFE), and then it employs the 3D sparse backbone and the 2D sparse backbone to extract high-level sparse features for predictions in the sparse detection head. L, W and H denote length, width, and height of feature maps, respectively.
  • Figure 3: Sparse encoder-decoder block. It adopts regular sparse convolution with stride 2 to down-sample feature maps and uses sparse inverse convolution SPConv to up-sample feature maps.
  • Figure 4: Illustration of uniform and adaptive feature diffusion. The red points denote object centers. The voxels with centers falling within object bounding boxes are indicated in dark orange, while those outside are in dark blue. The expanded features are indicated in light orange or light blue. Empty voxels are indicated in white.
  • Figure 5: Qualitative results on Argoverse2. The red, blue, and green boxes are human annotations, SAFDNet predictions and HEDNet predictions, respectively. The orange points denote the points that fall within the human-annotated boxes. SAFDNet performed comparably to HEDNet in some scenarios (top row). Additionally, SAFDNet demonstrated better predictions for small objects (bottom-left panel) but encountered challenges in direction prediction for partially large objects (bottom-right panel). Red arrows mark the prediction differences.