Table of Contents
Fetching ...

Out-of-Distribution Semantic Occupancy Prediction

Yuheng Zhang, Mengfei Duan, Kunyu Peng, Yuhang Wang, Ruiping Liu, Fei Teng, Kai Luo, Zhiyong Li, Kailun Yang

TL;DR

The work addresses the vulnerability of 3D semantic occupancy models to Out-of-Distribution (OoD) objects in urban driving by introducing Realistic Anomaly Augmentation to create OoD datasets and proposing OccOoD, a unified framework that fuses voxel and BEV representations through Cross-Space Semantic Refinement. It combines entropy- and cosine-based anomaly scoring with a geometry prior to detect OoD regions while maintaining competitive occupancy predictions. The approach achieves state-of-the-art OoD detection on both synthetic and real-world OoD datasets and demonstrates practical feasibility with real-time applicability. Public datasets and source code are provided to support robust OoD evaluation and safe deployment in autonomous driving systems.

Abstract

3D semantic occupancy prediction is crucial for autonomous driving, providing a dense, semantically rich environmental representation. However, existing methods focus on in-distribution scenes, making them susceptible to Out-of-Distribution (OoD) objects and long-tail distributions, which increases the risk of undetected anomalies and misinterpretations, posing safety hazards. To address these challenges, we introduce Out-of-Distribution Semantic Occupancy Prediction, targeting OoD detection in 3D voxel space. To fill dataset gaps, we propose a Realistic Anomaly Augmentation that injects synthetic anomalies while preserving realistic spatial and occlusion patterns, enabling the creation of two datasets: VAA-KITTI and VAA-KITTI-360. Then, a novel framework that integrates OoD detection into 3D semantic occupancy prediction, OccOoD, is proposed, which uses Cross-Space Semantic Refinement (CSSR) to refine semantic predictions from complementary voxel and BEV representations, improving OoD detection. Experimental results demonstrate that OccOoD achieves state-of-the-art OoD detection with an AuROC of 65.50% and an AuPRCr of 31.83 within a 1.2m region, while maintaining competitive semantic occupancy prediction performance and generalization in real-world urban driving scenes. The established datasets and source code will be made publicly available at https://github.com/7uHeng/OccOoD.

Out-of-Distribution Semantic Occupancy Prediction

TL;DR

The work addresses the vulnerability of 3D semantic occupancy models to Out-of-Distribution (OoD) objects in urban driving by introducing Realistic Anomaly Augmentation to create OoD datasets and proposing OccOoD, a unified framework that fuses voxel and BEV representations through Cross-Space Semantic Refinement. It combines entropy- and cosine-based anomaly scoring with a geometry prior to detect OoD regions while maintaining competitive occupancy predictions. The approach achieves state-of-the-art OoD detection on both synthetic and real-world OoD datasets and demonstrates practical feasibility with real-time applicability. Public datasets and source code are provided to support robust OoD evaluation and safe deployment in autonomous driving systems.

Abstract

3D semantic occupancy prediction is crucial for autonomous driving, providing a dense, semantically rich environmental representation. However, existing methods focus on in-distribution scenes, making them susceptible to Out-of-Distribution (OoD) objects and long-tail distributions, which increases the risk of undetected anomalies and misinterpretations, posing safety hazards. To address these challenges, we introduce Out-of-Distribution Semantic Occupancy Prediction, targeting OoD detection in 3D voxel space. To fill dataset gaps, we propose a Realistic Anomaly Augmentation that injects synthetic anomalies while preserving realistic spatial and occlusion patterns, enabling the creation of two datasets: VAA-KITTI and VAA-KITTI-360. Then, a novel framework that integrates OoD detection into 3D semantic occupancy prediction, OccOoD, is proposed, which uses Cross-Space Semantic Refinement (CSSR) to refine semantic predictions from complementary voxel and BEV representations, improving OoD detection. Experimental results demonstrate that OccOoD achieves state-of-the-art OoD detection with an AuROC of 65.50% and an AuPRCr of 31.83 within a 1.2m region, while maintaining competitive semantic occupancy prediction performance and generalization in real-world urban driving scenes. The established datasets and source code will be made publicly available at https://github.com/7uHeng/OccOoD.

Paper Structure

This paper contains 16 sections, 12 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Visualization of the established task of Out-of-Distribution Semantic Occupancy Prediction. The mainstream semantic occupancy prediction methods tend to misclassify Out-of-Distribution (OoD) objects as inliers, thus endangering the safety of autonomous driving. Proposed OccOoD accurately identifies OoD objects, with anomaly scores visualized from low (blue) to high (red).
  • Figure 2: An illustration of the proposed Realistic Anomaly Augmentation. This pipeline is designed to address the challenges of collecting real-world OoD data by synthesizing anomalies that are physically plausible and contextually realistic.
  • Figure 3: Distribution map of VAA-KITTI and VAA-KITTI-360, containing $26$ distinct OoD categories grouped into five main types as shown.
  • Figure 4: An overview of the proposed OccOoD framework. The image encoder extracts 2D features and converts them into 3D features via View Transformation, guided by geometric occupancy. Seed voxels and other features are processed through Geometry Pairing and Semantic Refinement to generate enhanced BEV and voxel features. Finally, Cross-View Feature Synergy integrates these representations for rich geometry and high spatial efficiency.
  • Figure 5: Visualization of the results on VAA-KITTI and VAA-STU datasets. From left to right are the input RGB images, the SGN mei2024camera results, the OccOoD (ours) results, and the OoD ground truths. Zoom in for a better view.
  • ...and 1 more figures