Table of Contents
Fetching ...

VADet: Multi-frame LiDAR 3D Object Detection using Variable Aggregation

Chengjie Huang, Vahdat Abdelzad, Sean Sedwards, Krzysztof Czarnecki

TL;DR

Instead of aggregating the entire scene using a fixed number of frames, VADet performs aggregation per object, with the number of frames determined by an object's observed properties, such as speed and point den-sity.

Abstract

Input aggregation is a simple technique used by state-of-the-art LiDAR 3D object detectors to improve detection. However, increasing aggregation is known to have diminishing returns and even performance degradation, due to objects responding differently to the number of aggregated frames. To address this limitation, we propose an efficient adaptive method, which we call Variable Aggregation Detection (VADet). Instead of aggregating the entire scene using a fixed number of frames, VADet performs aggregation per object, with the number of frames determined by an object's observed properties, such as speed and point density. VADet thus reduces the inherent trade-offs of fixed aggregation and is not architecture specific. To demonstrate its benefits, we apply VADet to three popular single-stage detectors and achieve state-of-the-art performance on the Waymo dataset.

VADet: Multi-frame LiDAR 3D Object Detection using Variable Aggregation

TL;DR

Instead of aggregating the entire scene using a fixed number of frames, VADet performs aggregation per object, with the number of frames determined by an object's observed properties, such as speed and point den-sity.

Abstract

Input aggregation is a simple technique used by state-of-the-art LiDAR 3D object detectors to improve detection. However, increasing aggregation is known to have diminishing returns and even performance degradation, due to objects responding differently to the number of aggregated frames. To address this limitation, we propose an efficient adaptive method, which we call Variable Aggregation Detection (VADet). Instead of aggregating the entire scene using a fixed number of frames, VADet performs aggregation per object, with the number of frames determined by an object's observed properties, such as speed and point density. VADet thus reduces the inherent trade-offs of fixed aggregation and is not architecture specific. To demonstrate its benefits, we apply VADet to three popular single-stage detectors and achieve state-of-the-art performance on the Waymo dataset.

Paper Structure

This paper contains 22 sections, 6 equations, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: Performance trade-off between stationary ($<$0.2 m/s), slow ([0.2,10) m/s), and fast (10 m/s) vehicles. The relative AP in \ref{['fig:static-dynamic-aph']} illustrates the improvement/degradation relative to using 3-frame fixed aggregation. \ref{['fig:static-dynamic-static']} and \ref{['fig:static-dynamic-dynamic']} are examples of stationary and fast-moving vehicles after 16-frame fixed aggregation.
  • Figure 2: Qualitative results comparing 3-frame and 16-frame fixed aggregation with VADet. Red and green bounding boxes are ground truth and predictions, respectively. Predictions are filtered with 0.5 confidence threshold for visual clarity.
  • Figure 3: AP vs. the number of frames for stationary ($<$0.2 m/s), slow ([0.2,10) m/s), and fast-moving ($\geq$10 m/s) vehicles.
  • Figure 4: AP vs. the number of frames for sparse ($<$2 pts/m$^2$), medium ((2,100] pts/m$^2$), and dense ($>$100 pts/m$^2$) dynamic vehicles.