PDM-SSD: Single-Stage Three-Dimensional Object Detector With Point Dilation

Ao Liang; Haiyang Hua; Jian Fang; Wenyu Chen; Huaici Zhao

PDM-SSD: Single-Stage Three-Dimensional Object Detector With Point Dilation

Ao Liang, Haiyang Hua, Jian Fang, Wenyu Chen, Huaici Zhao

TL;DR

PDM-SSD tackles the limited receptive field of point-based 3D detectors by introducing a Point Dilation Mechanism that lifts sampled points onto a 2D grid and fills unoccupied space using angular and scale information derived from spherical harmonics and Gaussian densities. A PointNet-style backbone provides efficient per-point features, while the neck expands the feature space and a hybrid head jointly learns from dilated grid features and point-wise context. On KITTI, PDM-SSD achieves state-of-the-art performance among single-stage point-based detectors with fast inference (~68 FPS) and demonstrates robustness for sparse and incomplete objects, with auxiliary PDM further boosting accuracy without speed loss. The method balances accuracy and deployment practicality, offering a scalable approach to 3D detection in autonomous driving and related applications. $L_{all}=L_{sample}+L_{p}+L_{heatmap}+L_2$, with $L_p=L_{vote}+L_{cls}+L_{reg}$ and $L_{reg}=L_{loc}+L_{size}+L_{angle-bin}+L_{angle-res}+L_{corner}$, and uses $Mask_i$ in $L_{sample}$ to emphasize central points, guiding robust learning for sparse targets.

Abstract

Current Point-based detectors can only learn from the provided points, with limited receptive fields and insufficient global learning capabilities for such targets. In this paper, we present a novel Point Dilation Mechanism for single-stage 3D detection (PDM-SSD) that takes advantage of these two representations. Specifically, we first use a PointNet-style 3D backbone for efficient feature encoding. Then, a neck with Point Dilation Mechanism (PDM) is used to expand the feature space, which involves two key steps: point dilation and feature filling. The former expands points to a certain size grid centered around the sampled points in Euclidean space. The latter fills the unoccupied grid with feature for backpropagation using spherical harmonic coefficients and Gaussian density function in terms of direction and scale. Next, we associate multiple dilation centers and fuse coefficients to obtain sparse grid features through height compression. Finally, we design a hybrid detection head for joint learning, where on one hand, the scene heatmap is predicted to complement the voting point set for improved detection accuracy, and on the other hand, the target probability of detected boxes are calibrated through feature fusion. On the challenging Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) dataset, PDM-SSD achieves state-of-the-art results for multi-class detection among single-modal methods with an inference speed of 68 frames. We also demonstrate the advantages of PDM-SSD in detecting sparse and incomplete objects through numerous object-level instances. Additionally, PDM can serve as an auxiliary network to establish a connection between sampling points and object centers, thereby improving the accuracy of the model without sacrificing inference speed. Our code will be available at https://github.com/AlanLiangC/PDM-SSD.git.

PDM-SSD: Single-Stage Three-Dimensional Object Detector With Point Dilation

TL;DR

Abstract

PDM-SSD: Single-Stage Three-Dimensional Object Detector With Point Dilation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)