Table of Contents
Fetching ...

An Instance-Centric Panoptic Occupancy Prediction Benchmark for Autonomous Driving

Yi Feng, Junwu E, Zizhan Guo, Yu Ma, Hanli Wang, Rui Fan

Abstract

Panoptic occupancy prediction aims to jointly infer voxel-wise semantics and instance identities within a unified 3D scene representation. Nevertheless, progress in this field remains constrained by the absence of high-quality 3D mesh resources, instance-level annotations, and physically consistent occupancy datasets. Existing benchmarks typically provide incomplete and low-resolution geometry without instance-level annotations, limiting the development of models capable of achieving precise geometric reconstruction, reliable occlusion reasoning, and holistic 3D understanding. To address these challenges, this paper presents an instance-centric benchmark for the 3D panoptic occupancy prediction task. Specifically, we introduce ADMesh, the first unified 3D mesh library tailored for autonomous driving, which integrates over 15K high-quality 3D models with diverse textures and rich semantic annotations. Building upon ADMesh, we further construct CarlaOcc, a large-scale, physically consistent panoptic occupancy dataset generated using the CARLA simulator. This dataset contains over 100K frames with fine-grained, instance-level occupancy ground truth at voxel resolutions as fine as 0.05 m. Furthermore, standardized evaluation metrics are introduced to quantify the quality of existing occupancy datasets. Finally, a systematic benchmark of representative models is established on the proposed dataset, which provides a unified platform for fair comparison and reproducible research in the field of 3D panoptic perception. Code and dataset are available at https://mias.group/CarlaOcc.

An Instance-Centric Panoptic Occupancy Prediction Benchmark for Autonomous Driving

Abstract

Panoptic occupancy prediction aims to jointly infer voxel-wise semantics and instance identities within a unified 3D scene representation. Nevertheless, progress in this field remains constrained by the absence of high-quality 3D mesh resources, instance-level annotations, and physically consistent occupancy datasets. Existing benchmarks typically provide incomplete and low-resolution geometry without instance-level annotations, limiting the development of models capable of achieving precise geometric reconstruction, reliable occlusion reasoning, and holistic 3D understanding. To address these challenges, this paper presents an instance-centric benchmark for the 3D panoptic occupancy prediction task. Specifically, we introduce ADMesh, the first unified 3D mesh library tailored for autonomous driving, which integrates over 15K high-quality 3D models with diverse textures and rich semantic annotations. Building upon ADMesh, we further construct CarlaOcc, a large-scale, physically consistent panoptic occupancy dataset generated using the CARLA simulator. This dataset contains over 100K frames with fine-grained, instance-level occupancy ground truth at voxel resolutions as fine as 0.05 m. Furthermore, standardized evaluation metrics are introduced to quantify the quality of existing occupancy datasets. Finally, a systematic benchmark of representative models is established on the proposed dataset, which provides a unified platform for fair comparison and reproducible research in the field of 3D panoptic perception. Code and dataset are available at https://mias.group/CarlaOcc.

Paper Structure

This paper contains 22 sections, 17 equations, 12 figures, 10 tables.

Figures (12)

  • Figure 1: Overview of the proposed benchmark. ADMesh provides the first large-scale, semantically structured 3D mesh library for autonomous driving. CarlaOcc leverages these assets to construct a multi-modal, high-fidelity, and physically consistent panoptic occupancy dataset, featuring variable voxel resolutions and rich instance-level annotations for comprehensive 3D perception benchmarking.
  • Figure 2: Overview of the proposed ADMesh library and the CarlaOcc generation pipeline. The ADMesh library is constructed by extracting and organizing diverse 3D assets from multiple sources, which are subsequently used to reconstruct dynamic scenes with both static structures and temporally aligned non-rigid motions. The resulting unified scene meshes are then used to rectify sensor artifacts and further processed with a topology-aware mesh permutation strategy to produce non-overlapping panoptic occupancy labels.
  • Figure 3: Statistics of ADMesh and CarlaOcc: (a) semantic mesh distribution in ADMesh, (b) semantic voxel distribution in CarlaOcc, and (c) instance count distribution in CarlaOcc.
  • Figure 4: Visualization of data modalities in CarlaOcc.
  • Figure 5: Visualizations of instance-guided rectification of sensor artifacts: (a) RGB images; (b) and (d) the rendered depth and semantic maps from CARLA; (c) and (e) the corresponding refined results. Regions with incorrect semantics or depth values caused by transparency are highlighted in red, while the instances that lose their categorical labels are highlighted in blue.
  • ...and 7 more figures