Better Monocular 3D Detectors with LiDAR from the Past
Yurong You, Cheng Perng Phoo, Carlos Andres Diaz-Ruiz, Katie Z Luo, Wei-Lun Chao, Mark Campbell, Bharath Hariharan, Kilian Q Weinberger
TL;DR
This work addresses the fundamental depth-ambiguity challenge in monocular 3D detection by leveraging unlabeled LiDAR data from past traversals through AsyncDepth. The method creates asynchronous depth features by densifying past LiDAR point clouds, projecting them into current camera views to form depth maps, and learning depth-aware representations that are fused with current image features in an end-to-end framework. Across Lyft L5 and Ithaca365, AsyncDepth consistently improves two representative monocular detectors (FCOS3D and Lift-Splat LSS) with low latency and modest storage costs, achieving up to 9.5 mAP gains in far-range detections. The results demonstrate practical potential for community-based LiDAR data sharing to upgrade camera-only perception, enabling cheaper autonomous systems without sacrificing performance.
Abstract
Accurate 3D object detection is crucial to autonomous driving. Though LiDAR-based detectors have achieved impressive performance, the high cost of LiDAR sensors precludes their widespread adoption in affordable vehicles. Camera-based detectors are cheaper alternatives but often suffer inferior performance compared to their LiDAR-based counterparts due to inherent depth ambiguities in images. In this work, we seek to improve monocular 3D detectors by leveraging unlabeled historical LiDAR data. Specifically, at inference time, we assume that the camera-based detectors have access to multiple unlabeled LiDAR scans from past traversals at locations of interest (potentially from other high-end vehicles equipped with LiDAR sensors). Under this setup, we proposed a novel, simple, and end-to-end trainable framework, termed AsyncDepth, to effectively extract relevant features from asynchronous LiDAR traversals of the same location for monocular 3D detectors. We show consistent and significant performance gain (up to 9 AP) across multiple state-of-the-art models and datasets with a negligible additional latency of 9.66 ms and a small storage cost.
