Adaptive LiDAR Scanning: Harnessing Temporal Cues for Efficient 3D Object Detection via Multi-Modal Fusion
Sara Shoouri, Morteza Tavakoli Taba, Hun-Seok Kim
TL;DR
The paper tackles the energy and bandwidth costs of dense LiDAR scans in multi-modal 3D object detection by exploiting temporal continuity. It introduces a two-stage adaptive sensing pipeline comprising a History-Aware Query Predictor and a Differentiable Mask Generator to produce ROI masks $M^t$ and guide frame-by-frame LiDAR sampling via Gumbel-Softmax, complemented by differentiable voxelization and a CVaR-based loss for robustness. Integrated with a state-of-the-art camera-LiDAR fusion backbone, the method achieves over $65\%$ LiDAR sparsity on nuScenes and Lyft without sacrificing, and often improving, detection accuracy. This approach reduces sensor data and computational load, enabling more practical deployment of perceptual systems on energy-constrained platforms while maintaining high detection performance.
Abstract
Multi-sensor fusion using LiDAR and RGB cameras significantly enhances 3D object detection task. However, conventional LiDAR sensors perform dense, stateless scans, ignoring the strong temporal continuity in real-world scenes. This leads to substantial sensing redundancy and excessive power consumption, limiting their practicality on resource-constrained platforms. To address this inefficiency, we propose a predictive, history-aware adaptive scanning framework that anticipates informative regions of interest (ROI) based on past observations. Our approach introduces a lightweight predictor network that distills historical spatial and temporal contexts into refined query embeddings. These embeddings guide a differentiable Mask Generator network, which leverages Gumbel-Softmax sampling to produce binary masks identifying critical ROIs for the upcoming frame. Our method significantly reduces unnecessary data acquisition by concentrating dense LiDAR scanning only within these ROIs and sparsely sampling elsewhere. Experiments on nuScenes and Lyft benchmarks demonstrate that our adaptive scanning strategy reduces LiDAR energy consumption by over 65% while maintaining competitive or even superior 3D object detection performance compared to traditional LiDAR-camera fusion methods with dense LiDAR scanning.
