3D Object Visibility Prediction in Autonomous Driving
Chuanyu Luo, Nuo Cheng, Ren Zhong, Haipeng Jiang, Wenyu Chen, Aoli Wang, Pu Li
TL;DR
The paper tackles predicting 3D object visibility in autonomous driving and argues for integrating visibility into the 3D bounding box prediction stage rather than post-processing. It defines visibility using a 3D-to-sphere projection (Definition 3) and computes occlusion via solid angles: $\Omega_i = A_i / r^2$ and $A_i = \iint_S \sin\theta \, d\theta \, d\phi$, with $V_i = (\Omega_i - \Omega_i \cap (\bigcup_{j \in C_i} \Omega_j)) / \Omega_i$, achieving $O(N^2)$ complexity. It uses multi-task learning to predict visibility online by adding a lightweight $MLP$ with $O(1)$ inference time, keeping sensor-agnostic applicability across LiDAR, image, or fused pipelines and incurring negligible cost to accuracy and speed on benchmarks like KITTI with PointPillars and SECOND. The results show close agreement between predicted visibility and algorithm-derived ground truth, supporting safer downstream planning with minimal performance trade-offs.
Abstract
With the rapid advancement of hardware and software technologies, research in autonomous driving has seen significant growth. The prevailing framework for multi-sensor autonomous driving encompasses sensor installation, perception, path planning, decision-making, and motion control. At the perception phase, a common approach involves utilizing neural networks to infer 3D bounding box (Bbox) attributes from raw sensor data, including classification, size, and orientation. In this paper, we present a novel attribute and its corresponding algorithm: 3D object visibility. By incorporating multi-task learning, the introduction of this attribute, visibility, negligibly affects the model's effectiveness and efficiency. Our proposal of this attribute and its computational strategy aims to expand the capabilities for downstream tasks, thereby enhancing the safety and reliability of real-time autonomous driving in real-world scenarios.
