SeSame: Simple, Easy 3D Object Detection with Point-Wise Semantics
Hayeon O, Chanuk Yang, Kunsoo Huh
TL;DR
This work tackles the limited semantic context in LiDAR-only 3D object detectors by injecting per-point semantics extracted from LiDAR semantic segmentation into existing detectors, without requiring camera–LiDAR calibration. The SeSame pipeline uses Cylinder3D to generate per-point labels, maps them to KITTI classes, and concatenates semantic features with raw point coordinates to augment PointRCNN, SECOND, and PointPillar-based detectors. Evaluations on KITTI show consistent improvements over baselines and several multimodal methods across BEV and 3D detection metrics, particularly for car detections. Ablation analyses reveal that one-hot label encodings are more robust than soft scores, and analysis indicates images excel for sparse objects while LiDAR semantics better handle occlusion for larger objects, highlighting complementary strengths. The approach demonstrates the viability of calibration-free semantic augmentation for LiDAR-only detection, though it relies on semantic annotations; future work targets self-supervised multimodal semantic segmentation as a pretext task to reduce labeling costs.
Abstract
In autonomous driving, 3D object detection provides more precise information for downstream tasks, including path planning and motion estimation, compared to 2D object detection. In this paper, we propose SeSame: a method aimed at enhancing semantic information in existing LiDAR-only based 3D object detection. This addresses the limitation of existing 3D detectors, which primarily focus on object presence and classification, thus lacking in capturing relationships between elemental units that constitute the data, akin to semantic segmentation. Experiments demonstrate the effectiveness of our method with performance improvements on the KITTI object detection benchmark. Our code is available at https://github.com/HAMA-DL-dev/SeSame
