Table of Contents
Fetching ...

3D Object Visibility Prediction in Autonomous Driving

Chuanyu Luo, Nuo Cheng, Ren Zhong, Haipeng Jiang, Wenyu Chen, Aoli Wang, Pu Li

TL;DR

The paper tackles predicting 3D object visibility in autonomous driving and argues for integrating visibility into the 3D bounding box prediction stage rather than post-processing. It defines visibility using a 3D-to-sphere projection (Definition 3) and computes occlusion via solid angles: $\Omega_i = A_i / r^2$ and $A_i = \iint_S \sin\theta \, d\theta \, d\phi$, with $V_i = (\Omega_i - \Omega_i \cap (\bigcup_{j \in C_i} \Omega_j)) / \Omega_i$, achieving $O(N^2)$ complexity. It uses multi-task learning to predict visibility online by adding a lightweight $MLP$ with $O(1)$ inference time, keeping sensor-agnostic applicability across LiDAR, image, or fused pipelines and incurring negligible cost to accuracy and speed on benchmarks like KITTI with PointPillars and SECOND. The results show close agreement between predicted visibility and algorithm-derived ground truth, supporting safer downstream planning with minimal performance trade-offs.

Abstract

With the rapid advancement of hardware and software technologies, research in autonomous driving has seen significant growth. The prevailing framework for multi-sensor autonomous driving encompasses sensor installation, perception, path planning, decision-making, and motion control. At the perception phase, a common approach involves utilizing neural networks to infer 3D bounding box (Bbox) attributes from raw sensor data, including classification, size, and orientation. In this paper, we present a novel attribute and its corresponding algorithm: 3D object visibility. By incorporating multi-task learning, the introduction of this attribute, visibility, negligibly affects the model's effectiveness and efficiency. Our proposal of this attribute and its computational strategy aims to expand the capabilities for downstream tasks, thereby enhancing the safety and reliability of real-time autonomous driving in real-world scenarios.

3D Object Visibility Prediction in Autonomous Driving

TL;DR

The paper tackles predicting 3D object visibility in autonomous driving and argues for integrating visibility into the 3D bounding box prediction stage rather than post-processing. It defines visibility using a 3D-to-sphere projection (Definition 3) and computes occlusion via solid angles: and , with , achieving complexity. It uses multi-task learning to predict visibility online by adding a lightweight with inference time, keeping sensor-agnostic applicability across LiDAR, image, or fused pipelines and incurring negligible cost to accuracy and speed on benchmarks like KITTI with PointPillars and SECOND. The results show close agreement between predicted visibility and algorithm-derived ground truth, supporting safer downstream planning with minimal performance trade-offs.

Abstract

With the rapid advancement of hardware and software technologies, research in autonomous driving has seen significant growth. The prevailing framework for multi-sensor autonomous driving encompasses sensor installation, perception, path planning, decision-making, and motion control. At the perception phase, a common approach involves utilizing neural networks to infer 3D bounding box (Bbox) attributes from raw sensor data, including classification, size, and orientation. In this paper, we present a novel attribute and its corresponding algorithm: 3D object visibility. By incorporating multi-task learning, the introduction of this attribute, visibility, negligibly affects the model's effectiveness and efficiency. Our proposal of this attribute and its computational strategy aims to expand the capabilities for downstream tasks, thereby enhancing the safety and reliability of real-time autonomous driving in real-world scenarios.
Paper Structure (14 sections, 4 equations, 7 figures, 2 tables)

This paper contains 14 sections, 4 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: This figure presents three distinct definitions of visibility. For illustrative purposes, Definitions 2 and 3 are depicted from a top-down perspective. However, it's important to note that within our study, both of these definitions are applied and analyzed in a three-dimensional context.
  • Figure 2: The algorithm for calculating visibility involves projecting the red and blue Bboxes onto a unit sphere centered at the origin. The visibility of the blue Bbox is 100%, as no other Bbox is closer to the origin. The unoccluded area of the red Bbox is determined by subtracting the area of overlap with the blue Bbox, which is characterized by a gradient color, from its total projected area. Consequently, the red Bbox is partially occluded. Its visibility is calculated as the ratio of the area exclusively occupied by the red projection to the entire projection area.
  • Figure 3: The traditional workflow to calculate the visibility. In the prediction stage, a deep learning model, such PointPillars lang2019pointpillars or SECOND yan2018second, can be used to predict the Bbox. Then the predicted Bbox can be used by proposed algorithm to calculate its visibility in the post-prediction stage.
  • Figure 4: The proposed workflow to calculate the visibility. The visibility is calculated by algorithm offline and used for training in multi-task leaning. In the real-time inference, the visibility is predicted by extending an extra simple MLP layer.
  • Figure 5: Qualitative examples of predicted Bbox, predicted visibility and ground-truth (GT) visibility, which is derived from algorithm using GT Bbox. If one Bbox is predicted but is not GT, the GT visibility is noted as -. These examples shows that the multi-task learning model has high similar visibility compared to the GT. The three heavily occluded objects within blue circles indicate possibly helpful information for downstream tasks in real-world scenarios.
  • ...and 2 more figures