Hybrid Pooling and Convolutional Network for Improving Accuracy and Training Convergence Speed in Object Detection
Shiwen Zhao, Wei Wang, Junhui Hou, Hai Wu
TL;DR
The paper tackles the dual challenges of accuracy and training convergence in voxel-based 3D object detection. It introduces HPC-Net, a multimodal detector with three innovations: Replaceable Pooling (RP) for flexible 3D/2D pooling, Depth Accelerated Convergence Convolution (DACConv) to speed up training without sacrificing accuracy, and MEFEM to expand receptive fields and fuse multi-scale features for occluded/truncated objects. Evaluations on KITTI (and supplementary Waymo data) show state-of-the-art Car 2D results and competitive Car 3D performance, with substantial ablations confirming each component’s contribution to speed and accuracy. The approach, built on a Voxel-RCNN backbone with PENet virtual points, offers practical benefits for autonomous driving by reducing training time while delivering high-precision object detection in challenging scenarios.
Abstract
This paper introduces HPC-Net, a high-precision and rapidly convergent object detection network.
