Frustum PointNets for 3D Object Detection from RGB-D Data
Charles R. Qi, Wei Liu, Chenxia Wu, Hao Su, Leonidas J. Guibas
TL;DR
<p>Frustum PointNets address 3D object detection from RGB-D data by transforming 2D detections into 3D frustums and applying PointNet-based 3D instance segmentation within each frustum, followed by amodal 3D box estimation. A T-Net aligns object points to a center frame and a corner-loss regularizes joint optimization of center, size, and heading, all trained with a multi-task loss. The approach yields state-of-the-art results on KITTI and SUN-RGBD, running in real-time and robust to occlusion and sparse data, illustrating a scalable 3D-centric pipeline that preserves geometric structure in 3D space. This framework demonstrates the practical impact of integrating 2D proposals with 3D point-net processing for accurate, efficient 3D object detection in both outdoor and indoor scenes, with broad applicability to autonomous driving and robotics.
Abstract
In this work, we study 3D object detection from RGB-D data in both indoor and outdoor scenes. While previous methods focus on images or 3D voxels, often obscuring natural 3D patterns and invariances of 3D data, we directly operate on raw point clouds by popping up RGB-D scans. However, a key challenge of this approach is how to efficiently localize objects in point clouds of large-scale scenes (region proposal). Instead of solely relying on 3D proposals, our method leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects. Benefited from learning directly in raw point clouds, our method is also able to precisely estimate 3D bounding boxes even under strong occlusion or with very sparse points. Evaluated on KITTI and SUN RGB-D 3D detection benchmarks, our method outperforms the state of the art by remarkable margins while having real-time capability.
