Table of Contents
Fetching ...

$α$-OCC: Uncertainty-Aware Camera-based 3D Semantic Occupancy Prediction

Sanbao Su, Nuo Chen, Chenchen Lin, Felix Juefei-Xu, Chen Feng, Fei Miao

TL;DR

This work tackles the uncertainty in camera-based 3D Semantic Occupancy Prediction by introducing Depth-UP, a direct-modeling framework that propagates depth uncertainty into both geometry completion and semantic segmentation, and HCP, a hierarchical conformal prediction method that yields principled, class-balanced uncertainty sets for OCC in highly imbalanced datasets. Together, Depth-UP and HCP form the α-OCC framework, achieving substantial accuracy gains (e.g., IoU up to $11.58 ext{ extpercent}$ and mIoU up to $12.95 ext{ extpercent}$) and robust uncertainty quantification (e.g., up to $92 ext{ extpercent}$ reduction in set size and $84 ext{ extpercent}$ reduction in coverage gap) across multiple OCC models and datasets. The KL-based geometric score within HCP enables better occupied recall for rare safety-critical classes, while the semantic-level CP maintains per-class coverage when voxels are predicted as occupied. These results demonstrate that explicitly modeling and propagating depth uncertainty, coupled with hierarchicalUQ, improves OCC robustness and safety-critical detection in autonomous perception systems, with potential applicability to other imbalanced 3D perception tasks.

Abstract

In the realm of autonomous vehicle perception, comprehending 3D scenes is paramount for tasks such as planning and mapping. Camera-based 3D Semantic Occupancy Prediction (OCC) aims to infer scene geometry and semantics from limited observations. While it has gained popularity due to affordability and rich visual cues, existing methods often neglect the inherent uncertainty in models. To address this, we propose an uncertainty-aware OCC method ($α$-OCC). We first introduce Depth-UP, an uncertainty propagation framework that improves geometry completion by up to 11.58\% and semantic segmentation by up to 12.95\% across various OCC models. For uncertainty quantification (UQ), we propose the hierarchical conformal prediction (HCP) method, effectively handling the high-level class imbalance in OCC datasets. On the geometry level, the novel KL-based score function significantly improves the occupied recall (45\%) of safety-critical classes with minimal performance overhead (3.4\% reduction). On UQ, our HCP achieves smaller prediction set sizes while maintaining the defined coverage guarantee. Compared with baselines, it reduces up to 92\% set size, with 18\% further reduction when integrated with Depth-UP. Our contributions advance OCC accuracy and robustness, marking a noteworthy step forward in autonomous perception systems.

$α$-OCC: Uncertainty-Aware Camera-based 3D Semantic Occupancy Prediction

TL;DR

This work tackles the uncertainty in camera-based 3D Semantic Occupancy Prediction by introducing Depth-UP, a direct-modeling framework that propagates depth uncertainty into both geometry completion and semantic segmentation, and HCP, a hierarchical conformal prediction method that yields principled, class-balanced uncertainty sets for OCC in highly imbalanced datasets. Together, Depth-UP and HCP form the α-OCC framework, achieving substantial accuracy gains (e.g., IoU up to and mIoU up to ) and robust uncertainty quantification (e.g., up to reduction in set size and reduction in coverage gap) across multiple OCC models and datasets. The KL-based geometric score within HCP enables better occupied recall for rare safety-critical classes, while the semantic-level CP maintains per-class coverage when voxels are predicted as occupied. These results demonstrate that explicitly modeling and propagating depth uncertainty, coupled with hierarchicalUQ, improves OCC robustness and safety-critical detection in autonomous perception systems, with potential applicability to other imbalanced 3D perception tasks.

Abstract

In the realm of autonomous vehicle perception, comprehending 3D scenes is paramount for tasks such as planning and mapping. Camera-based 3D Semantic Occupancy Prediction (OCC) aims to infer scene geometry and semantics from limited observations. While it has gained popularity due to affordability and rich visual cues, existing methods often neglect the inherent uncertainty in models. To address this, we propose an uncertainty-aware OCC method (-OCC). We first introduce Depth-UP, an uncertainty propagation framework that improves geometry completion by up to 11.58\% and semantic segmentation by up to 12.95\% across various OCC models. For uncertainty quantification (UQ), we propose the hierarchical conformal prediction (HCP) method, effectively handling the high-level class imbalance in OCC datasets. On the geometry level, the novel KL-based score function significantly improves the occupied recall (45\%) of safety-critical classes with minimal performance overhead (3.4\% reduction). On UQ, our HCP achieves smaller prediction set sizes while maintaining the defined coverage guarantee. Compared with baselines, it reduces up to 92\% set size, with 18\% further reduction when integrated with Depth-UP. Our contributions advance OCC accuracy and robustness, marking a noteworthy step forward in autonomous perception systems.
Paper Structure (25 sections, 1 theorem, 7 equations, 10 figures, 6 tables, 1 algorithm)

This paper contains 25 sections, 1 theorem, 7 equations, 10 figures, 6 tables, 1 algorithm.

Key Result

Proposition 1

For a desired error rate $\alpha^y$, we select $\alpha_o^y$ and $\alpha_s^y$ as $1-\alpha^y = (1-\alpha_s^y)(1-\alpha_o^y)$, then the prediction set generated as Eq. eq:semantic_prediction_set satisfies $\mathbb{P}(\mathbf{Y}_{test} \in \mathcal{C}(\mathbf{X}_{test}) | \mathbf{Y}_{test} = y) \geq 1

Figures (10)

  • Figure 1: (a): As the percentage of depth uncertainty increases, the accuracy (mIoU$\uparrow$) of OCC decreases significantly. (b): High class imbalance on OCC. The percentage next to each class is its percentage in the SemanticKITTI dataset. Since the safety-critical class "bicyclist" only occupied 0.01%, the trained OCC model fails to detect the bicyclist in front, leading to a crash. However, after quantifying the uncertainty and post-processing using our HCP, the crash is avoided. This is because our HCP improves the occupied recall of rare classes. When applying our Depth-UP and HCP together, safety is further enhanced as the bicyclist is more accurately identified. In contrast, using only HCP often assigns the highest probability to the car (blue) for many bicyclist voxels. Due to visualization constraints, each occupied voxel is represented by the nonempty class with the highest probability in the predicted set from HCP.
  • Figure 2: Overview of our $\alpha$-OCC method. The non-black colors highlight the novelties and important techniques in our method. C denotes the concatenation of the depth feature $\mathbf{F}_D$ and image feature $\mathbf{F}_I$. In the Depth-UP part, we calculate the uncertainty of depth estimation through direct modeling. For depth model retraining, we only train the additional standard deviation head while keeping the rest of the model frozen. Then we propagate it through depth feature extraction (for semantic segmentation) and building a probabilistic voxel grid map $M_p$ by probabilistic geometry projection (for geometry completion). Each element of $M_p$ is the occupied probability of the corresponding voxel, computed by considering the depth distribution of all rays across the voxel.
  • Figure 3: Overview of our Hierarchical Conformal Prediction (HCP) method. We predict voxels' occupied state by the quantile on the KL-based score as Eq. \ref{['eq:geometric_score']}, which can improve occupied recall of rare classes, and then only generate prediction sets for these predicted occupied voxels. The occupied quantile $q_o^y$ and semantic quantile $q_s^y$ are computed during the calibration step of HCP.
  • Figure 4: Qualitative results of the base VoxFormer model and that with our Depth-UP.
  • Figure 5: Compare our KL-based score function with the class score function and the occupied score function. Evaluate OCC's geometry performance across different occupied recalls of the rare class. The red dotted line shows the IoU of the OCC model without CP. (a): Results on the basic VoxFormer across different datasets for the class person. (b): Results on SemanticKITTI across different basic OCC models for the class bicyclist.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof