Table of Contents
Fetching ...

Large receptive field strategy and important feature extraction strategy in 3D object detection

Leichao Cui, Xiuxian Li, Min Meng, Guangyu Jia

TL;DR

This paper tackles 3D LiDAR-based object detection by addressing two core challenges: expanding the receptive field of 3D convolutions and reducing redundant 3D features. It introduces two plug-and-play modules, Dynamic Feature Fusion Module (DFFM) and Feature Selection Module (FSM), to adaptively enlarge receptive fields and filter uninformative features, respectively, decoupling box fitting from feature extraction. Integrated into voxel-based detectors such as SECOND and VoxelNext, the approach yields consistent improvements in 3D mAP, especially for small objects, while also speeding up inference. The results on the KITTI dataset demonstrate that DFFM and FSM are complementary and offer practical gains for real-time 3D perception in autonomous driving.

Abstract

The enhancement of 3D object detection is pivotal for precise environmental perception and improved task execution capabilities in autonomous driving. LiDAR point clouds, offering accurate depth information, serve as a crucial information for this purpose. Our study focuses on key challenges in 3D target detection. To tackle the challenge of expanding the receptive field of a 3D convolutional kernel, we introduce the Dynamic Feature Fusion Module (DFFM). This module achieves adaptive expansion of the 3D convolutional kernel's receptive field, balancing the expansion with acceptable computational loads. This innovation reduces operations, expands the receptive field, and allows the model to dynamically adjust to different object requirements. Simultaneously, we identify redundant information in 3D features. Employing the Feature Selection Module (FSM) quantitatively evaluates and eliminates non-important features, achieving the separation of output box fitting and feature extraction. This innovation enables the detector to focus on critical features, resulting in model compression, reduced computational burden, and minimized candidate frame interference. Extensive experiments confirm that both DFFM and FSM not only enhance current benchmarks, particularly in small target detection, but also accelerate network performance. Importantly, these modules exhibit effective complementarity.

Large receptive field strategy and important feature extraction strategy in 3D object detection

TL;DR

This paper tackles 3D LiDAR-based object detection by addressing two core challenges: expanding the receptive field of 3D convolutions and reducing redundant 3D features. It introduces two plug-and-play modules, Dynamic Feature Fusion Module (DFFM) and Feature Selection Module (FSM), to adaptively enlarge receptive fields and filter uninformative features, respectively, decoupling box fitting from feature extraction. Integrated into voxel-based detectors such as SECOND and VoxelNext, the approach yields consistent improvements in 3D mAP, especially for small objects, while also speeding up inference. The results on the KITTI dataset demonstrate that DFFM and FSM are complementary and offer practical gains for real-time 3D perception in autonomous driving.

Abstract

The enhancement of 3D object detection is pivotal for precise environmental perception and improved task execution capabilities in autonomous driving. LiDAR point clouds, offering accurate depth information, serve as a crucial information for this purpose. Our study focuses on key challenges in 3D target detection. To tackle the challenge of expanding the receptive field of a 3D convolutional kernel, we introduce the Dynamic Feature Fusion Module (DFFM). This module achieves adaptive expansion of the 3D convolutional kernel's receptive field, balancing the expansion with acceptable computational loads. This innovation reduces operations, expands the receptive field, and allows the model to dynamically adjust to different object requirements. Simultaneously, we identify redundant information in 3D features. Employing the Feature Selection Module (FSM) quantitatively evaluates and eliminates non-important features, achieving the separation of output box fitting and feature extraction. This innovation enables the detector to focus on critical features, resulting in model compression, reduced computational burden, and minimized candidate frame interference. Extensive experiments confirm that both DFFM and FSM not only enhance current benchmarks, particularly in small target detection, but also accelerate network performance. Importantly, these modules exhibit effective complementarity.
Paper Structure (15 sections, 10 equations, 9 figures, 6 tables)

This paper contains 15 sections, 10 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: The sizeable receptive field helps fully understand the object's overall structure and the surrounding environment's contextual information.
  • Figure 2: The visual architecture of the point cloud detection network consists of three key components: data processing, feature extraction, and detector. (a) Point clouds undergo various data processing methods for transformation. (b) Feature extraction employs diverse operations. (c) Extracted features are input into the detection network for object detection.
  • Figure 3: DFFM Components: (a) Convolutional decoupling module decomposes large receptive field kernels into smaller ones. (b) Adaptive perception module dynamically adjusts weights of intermediate features across various receptive fields.
  • Figure 4: The general structure of the FSM. (a) Importance weights are predicted for each voxel using the network. (b) The top 50% importance weight voxels are retained, discarding the rest. (c) Retained voxel features are multiplied by their weights to create output features. (d) The detection network categorizes samples as positive or negative based on truth box inclusion and adjusts the training accordingly.
  • Figure 5: Network components: feature extraction network, FSM, and 3D detector. (a) Derived from SECONDSecond, the feature extraction network has four layers. The first layer includes two residual blocks, and additional downsampling blocks exist the remaining three layers for feature size reduction. (b) Enhancements include DFFM and FSM, added to the position indicated by the dotted line.
  • ...and 4 more figures