Table of Contents
Fetching ...

DBQ-SSD: Dynamic Ball Query for Efficient 3D Object Detection

Jinrong Yang, Lin Song, Songtao Liu, Weixin Mao, Zeming Li, Xiaoping Li, Hongbin Sun, Jian Sun, Nanning Zheng

TL;DR

The paper tackles inefficiency in point-based 3D object detectors caused by redundant background points and fixed multi-scale grouping. It introduces Dynamic Ball Query (DBQ), a data-driven gating mechanism that adaptively selects a subset of queries across multiple radii and assigns receptive fields per point, enabling end-to-end training with reduced computation. By integrating DBQ into an IA-SSD backbone and employing a latency-aware training objective via Gumbel-Sigmoid gating, the method delivers substantial speedups (e.g., KITTI: up to 162–223 FPS; Waymo/ONCE: 27–30 FPS) while maintaining competitive or improved accuracy. Ablations show most gains arise from suppressing background points and by using point-wise routing with per-group gating, demonstrating strong generalization across datasets and offering a practical path to real-time 3D detection.

Abstract

Many point-based 3D detectors adopt point-feature sampling strategies to drop some points for efficient inference. These strategies are typically based on fixed and handcrafted rules, making it difficult to handle complicated scenes. Different from them, we propose a Dynamic Ball Query (DBQ) network to adaptively select a subset of input points according to the input features, and assign the feature transform with a suitable receptive field for each selected point. It can be embedded into some state-of-the-art 3D detectors and trained in an end-to-end manner, which significantly reduces the computational cost. Extensive experiments demonstrate that our method can increase the inference speed by 30%-100% on KITTI, Waymo, and ONCE datasets. Specifically, the inference speed of our detector can reach 162 FPS on KITTI scene, and 30 FPS on Waymo and ONCE scenes without performance degradation. Due to skipping the redundant points, some evaluation metrics show significant improvements. Codes will be released at https://github.com/yancie-yjr/DBQ-SSD.

DBQ-SSD: Dynamic Ball Query for Efficient 3D Object Detection

TL;DR

The paper tackles inefficiency in point-based 3D object detectors caused by redundant background points and fixed multi-scale grouping. It introduces Dynamic Ball Query (DBQ), a data-driven gating mechanism that adaptively selects a subset of queries across multiple radii and assigns receptive fields per point, enabling end-to-end training with reduced computation. By integrating DBQ into an IA-SSD backbone and employing a latency-aware training objective via Gumbel-Sigmoid gating, the method delivers substantial speedups (e.g., KITTI: up to 162–223 FPS; Waymo/ONCE: 27–30 FPS) while maintaining competitive or improved accuracy. Ablations show most gains arise from suppressing background points and by using point-wise routing with per-group gating, demonstrating strong generalization across datasets and offering a practical path to real-time 3D detection.

Abstract

Many point-based 3D detectors adopt point-feature sampling strategies to drop some points for efficient inference. These strategies are typically based on fixed and handcrafted rules, making it difficult to handle complicated scenes. Different from them, we propose a Dynamic Ball Query (DBQ) network to adaptively select a subset of input points according to the input features, and assign the feature transform with a suitable receptive field for each selected point. It can be embedded into some state-of-the-art 3D detectors and trained in an end-to-end manner, which significantly reduces the computational cost. Extensive experiments demonstrate that our method can increase the inference speed by 30%-100% on KITTI, Waymo, and ONCE datasets. Specifically, the inference speed of our detector can reach 162 FPS on KITTI scene, and 30 FPS on Waymo and ONCE scenes without performance degradation. Due to skipping the redundant points, some evaluation metrics show significant improvements. Codes will be released at https://github.com/yancie-yjr/DBQ-SSD.
Paper Structure (28 sections, 10 equations, 7 figures, 6 tables)

This paper contains 28 sections, 10 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Statistics of latency, background ratio, and size distribution on both KITTI valkitti and Waymo valwaymo sets. (a) reveals that the MLP network occupies the largest latency. "Q & G" means query and grouping operation. (b) reflects that redundant background points significantly dominate the input points of each stage. (c) means the distribution on varying object sizes (measuring in $\sqrt[3]{volume}$, where $volume$ is the volume of ground truth).
  • Figure 2: The pipeline of dynamic ball query in a set abstraction layer. 'NS' indicates the nearest sampling, which samples the query features from the input features. The query multiplexer generates gating masks to adaptively select a subset of input queries for each group. The remap operator is used to map the sparse features to the dense form.
  • Figure 3: Illustration of the effects on Dynamic Ball Query. All experiments are evaluated on KITTI val set. $\lambda$ is the scale parameter of resource budget loss in Eq. \ref{['eq:loss']}. Latency here is evaluated by a single RTX2080Ti GPU with a batch size of 16. (a) reports the comparison on both accuracies of Car class and overall latency distribution. (b) indicates the latency reduction of query & grouping operation and MLP network in different SA layers. (c) reflects the activation distribution of point features in different SA layers. (d) shows the proportion of point features go through different groups of MSG. "Small" and "Large" means activating on group branches with small and large radii respectively. "Kill" represents blocking all groups, while "Small & Large" means going through all scales of groups.
  • Figure 4: Visualization results on KITTI val set. The 3D boxes in the figures are the prediction boxes. Green, cyan, and yellow represent Car, Pedestrian, and Cyclist. Red and white points represent activation and blocking points, respectively. "Small" and "Large" means the scale of group in MSG, and the digital in parentheses is the index of SA layer.
  • Figure 5: Visualization results on Waymo val set. The red and green 3D boxes in figures are ground truth and prediction boxes. Green, cyan, and yellow represent Car, Pedestrian, and Cyclist. Red and white points represent activation and blocking points, respectively. "Small" and "Large" means the scale of group in MSG, and the digital in parentheses is the index of SA layer.
  • ...and 2 more figures