Image-Guided Semantic Pseudo-LiDAR Point Generation for 3D Object Detection
Minseung Lee, Seokha Moon, Seung Joon Lee, Reza Mahjourian, Jinkyu Kim
TL;DR
LiDAR sparsity hampers reliable 3D object detection, especially for small or distant objects. The authors propose ImagePG, a framework that generates dense, semantically meaningful pseudo-LiDAR points by fusing RGB image semantics with LiDAR through IG-RPG, I-OPN, and MR, guided by deformable attention and BEV priors. The approach yields substantial improvements on KITTI and Waymo, notably dramatically reducing false positives and achieving state-of-the-art cyclist detection on KITTI, while remaining compatible with multiple backbones. Overall, ImagePG demonstrates robust cross-dataset performance and offers a practical, modality-aware enhancement for multi-modal 3D perception in autonomous driving.
Abstract
In autonomous driving scenarios, accurate perception is becoming an even more critical task for safe navigation. While LiDAR provides precise spatial data, its inherent sparsity makes it difficult to detect small or distant objects. Existing methods try to address this by generating additional points within a Region of Interest (RoI), but relying on LiDAR alone often leads to false positives and a failure to recover meaningful structures. To address these limitations, we propose Image-Guided Semantic Pseudo-LiDAR Point Generation model, called ImagePG, a novel framework that leverages rich RGB image features to generate dense and semantically meaningful 3D points. Our framework includes an Image-Guided RoI Points Generation (IG-RPG) module, which creates pseudo-points guided by image features, and an Image-Aware Occupancy Prediction Network (I-OPN), which provides spatial priors to guide point placement. A multi-stage refinement (MR) module further enhances point quality and detection robustness. To the best of our knowledge, ImagePG is the first method to directly leverage image features for point generation. Extensive experiments on the KITTI and Waymo datasets demonstrate that ImagePG significantly improves the detection of small and distant objects like pedestrians and cyclists, reducing false positives by nearly 50%. On the KITTI benchmark, our framework improves mAP by +1.38%p (car), +7.91%p (pedestrian), and +5.21%p (cyclist) on the test set over the baseline, achieving state-of-the-art cyclist performance on the KITTI leaderboard. The code is available at: https://github.com/MS-LIMA/ImagePG
