Table of Contents
Fetching ...

Adaptive LiDAR Scanning: Harnessing Temporal Cues for Efficient 3D Object Detection via Multi-Modal Fusion

Sara Shoouri, Morteza Tavakoli Taba, Hun-Seok Kim

TL;DR

The paper tackles the energy and bandwidth costs of dense LiDAR scans in multi-modal 3D object detection by exploiting temporal continuity. It introduces a two-stage adaptive sensing pipeline comprising a History-Aware Query Predictor and a Differentiable Mask Generator to produce ROI masks $M^t$ and guide frame-by-frame LiDAR sampling via Gumbel-Softmax, complemented by differentiable voxelization and a CVaR-based loss for robustness. Integrated with a state-of-the-art camera-LiDAR fusion backbone, the method achieves over $65\%$ LiDAR sparsity on nuScenes and Lyft without sacrificing, and often improving, detection accuracy. This approach reduces sensor data and computational load, enabling more practical deployment of perceptual systems on energy-constrained platforms while maintaining high detection performance.

Abstract

Multi-sensor fusion using LiDAR and RGB cameras significantly enhances 3D object detection task. However, conventional LiDAR sensors perform dense, stateless scans, ignoring the strong temporal continuity in real-world scenes. This leads to substantial sensing redundancy and excessive power consumption, limiting their practicality on resource-constrained platforms. To address this inefficiency, we propose a predictive, history-aware adaptive scanning framework that anticipates informative regions of interest (ROI) based on past observations. Our approach introduces a lightweight predictor network that distills historical spatial and temporal contexts into refined query embeddings. These embeddings guide a differentiable Mask Generator network, which leverages Gumbel-Softmax sampling to produce binary masks identifying critical ROIs for the upcoming frame. Our method significantly reduces unnecessary data acquisition by concentrating dense LiDAR scanning only within these ROIs and sparsely sampling elsewhere. Experiments on nuScenes and Lyft benchmarks demonstrate that our adaptive scanning strategy reduces LiDAR energy consumption by over 65% while maintaining competitive or even superior 3D object detection performance compared to traditional LiDAR-camera fusion methods with dense LiDAR scanning.

Adaptive LiDAR Scanning: Harnessing Temporal Cues for Efficient 3D Object Detection via Multi-Modal Fusion

TL;DR

The paper tackles the energy and bandwidth costs of dense LiDAR scans in multi-modal 3D object detection by exploiting temporal continuity. It introduces a two-stage adaptive sensing pipeline comprising a History-Aware Query Predictor and a Differentiable Mask Generator to produce ROI masks and guide frame-by-frame LiDAR sampling via Gumbel-Softmax, complemented by differentiable voxelization and a CVaR-based loss for robustness. Integrated with a state-of-the-art camera-LiDAR fusion backbone, the method achieves over LiDAR sparsity on nuScenes and Lyft without sacrificing, and often improving, detection accuracy. This approach reduces sensor data and computational load, enabling more practical deployment of perceptual systems on energy-constrained platforms while maintaining high detection performance.

Abstract

Multi-sensor fusion using LiDAR and RGB cameras significantly enhances 3D object detection task. However, conventional LiDAR sensors perform dense, stateless scans, ignoring the strong temporal continuity in real-world scenes. This leads to substantial sensing redundancy and excessive power consumption, limiting their practicality on resource-constrained platforms. To address this inefficiency, we propose a predictive, history-aware adaptive scanning framework that anticipates informative regions of interest (ROI) based on past observations. Our approach introduces a lightweight predictor network that distills historical spatial and temporal contexts into refined query embeddings. These embeddings guide a differentiable Mask Generator network, which leverages Gumbel-Softmax sampling to produce binary masks identifying critical ROIs for the upcoming frame. Our method significantly reduces unnecessary data acquisition by concentrating dense LiDAR scanning only within these ROIs and sparsely sampling elsewhere. Experiments on nuScenes and Lyft benchmarks demonstrate that our adaptive scanning strategy reduces LiDAR energy consumption by over 65% while maintaining competitive or even superior 3D object detection performance compared to traditional LiDAR-camera fusion methods with dense LiDAR scanning.

Paper Structure

This paper contains 38 sections, 7 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Top: Conventional uniform LiDAR scanning. Bottom: Our adaptive LiDAR scanning, which leverages past frames to predict the next frame’s ROIs. ROIs are scanned densely and non-ROI areas receive sparse sampling.
  • Figure 2: (a) Overview of our adaptive LiDAR scanning pipeline. Prediction model utilizes historical information to identify ROIs for the upcoming frame, densely scanning those regions while sparsely sampling non-ROI sections. (b) The Query Prediction module takes the queries of $L$ transformer decoder layers of past frames from a memory buffer to predict the query stacks at time $t$, $Q'^t$, which are then fed to the Mask Generator to predict the final ROI mask.
  • Figure 3: Top: original images from two nuScenes validation examples. Bottom: corresponding selected LiDAR points that are densely scanned inside each ROI and sparsely elsewhere, with point colors encoding distance from the ego-vehicle (low resolution).
  • Figure 4: Ablation study on comparing performance under different LiDAR levels on nuScenes validation set.
  • Figure 5: Sparsity ratios vs. Lyft validation set performance
  • ...and 4 more figures