Table of Contents
Fetching ...

SimpleBEV: Improved LiDAR-Camera Fusion Architecture for 3D Object Detection

Yun Zhao, Zhan Gong, Peiru Zheng, Hong Zhu, Shaohua Wu

TL;DR

A LiDAR-camera fusion framework for accurate 3D object detection, which follows the BEV-based fusion framework and improves the camera and LiDAR encoders, respectively is proposed.

Abstract

More and more research works fuse the LiDAR and camera information to improve the 3D object detection of the autonomous driving system. Recently, a simple yet effective fusion framework has achieved an excellent detection performance, fusing the LiDAR and camera features in a unified bird's-eye-view (BEV) space. In this paper, we propose a LiDAR-camera fusion framework, named SimpleBEV, for accurate 3D object detection, which follows the BEV-based fusion framework and improves the camera and LiDAR encoders, respectively. Specifically, we perform the camera-based depth estimation using a cascade network and rectify the depth results with the depth information derived from the LiDAR points. Meanwhile, an auxiliary branch that implements the 3D object detection using only the camera-BEV features is introduced to exploit the camera information during the training phase. Besides, we improve the LiDAR feature extractor by fusing the multi-scaled sparse convolutional features. Experimental results demonstrate the effectiveness of our proposed method. Our method achieves 77.6\% NDS accuracy on the nuScenes dataset, showcasing superior performance in the 3D object detection track.

SimpleBEV: Improved LiDAR-Camera Fusion Architecture for 3D Object Detection

TL;DR

A LiDAR-camera fusion framework for accurate 3D object detection, which follows the BEV-based fusion framework and improves the camera and LiDAR encoders, respectively is proposed.

Abstract

More and more research works fuse the LiDAR and camera information to improve the 3D object detection of the autonomous driving system. Recently, a simple yet effective fusion framework has achieved an excellent detection performance, fusing the LiDAR and camera features in a unified bird's-eye-view (BEV) space. In this paper, we propose a LiDAR-camera fusion framework, named SimpleBEV, for accurate 3D object detection, which follows the BEV-based fusion framework and improves the camera and LiDAR encoders, respectively. Specifically, we perform the camera-based depth estimation using a cascade network and rectify the depth results with the depth information derived from the LiDAR points. Meanwhile, an auxiliary branch that implements the 3D object detection using only the camera-BEV features is introduced to exploit the camera information during the training phase. Besides, we improve the LiDAR feature extractor by fusing the multi-scaled sparse convolutional features. Experimental results demonstrate the effectiveness of our proposed method. Our method achieves 77.6\% NDS accuracy on the nuScenes dataset, showcasing superior performance in the 3D object detection track.

Paper Structure

This paper contains 12 sections, 1 equation, 3 figures, 5 tables.

Figures (3)

  • Figure 1: An overview of SimpleBEV framework. Two branches separately extract the features based on the LiDAR points and multi-view images. The features are transformed into a unified BEV space. The auxiliary branch only works in the training phase.
  • Figure 2: Pipeline of depth estimation.
  • Figure 3: Overview of the LiDAR branch structure. $P_{lidar}$ represents the LiDAR points.