Table of Contents
Fetching ...

SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving

Yiming Li, Sihang Li, Xinhao Liu, Moonjun Gong, Kenan Li, Nuo Chen, Zijun Wang, Zhiheng Li, Tao Jiang, Fisher Yu, Yue Wang, Hang Zhao, Zhiding Yu, Chen Feng

TL;DR

SSCBench addresses the lack of large-scale outdoor 3D semantic scene completion data by aggregating KITTI-360, nuScenes, and Waymo into a unified benchmark that supports monocular, trinocular, and LiDAR inputs with standardized cross-domain labels. It provides ~66,913 frames across three subsets and a consistent voxel-based evaluation framework to assess geometry and semantics within a forward-facing volume. The study benchmarks multiple camera- and LiDAR-based SSC methods, revealing how sensor density, depth estimation, and domain shifts affect performance and highlighting clear cross-domain generalization gaps. By enabling cross-dataset tests and unified labeling, SSCBench aims to drive robust, generalizable SSC solutions for real-world autonomous driving. The work also identifies limitations and points to temporal extensions as a key direction for future SSC benchmarks.

Abstract

Monocular scene understanding is a foundational component of autonomous systems. Within the spectrum of monocular perception topics, one crucial and useful task for holistic 3D scene understanding is semantic scene completion (SSC), which jointly completes semantic information and geometric details from RGB input. However, progress in SSC, particularly in large-scale street views, is hindered by the scarcity of high-quality datasets. To address this issue, we introduce SSCBench, a comprehensive benchmark that integrates scenes from widely used automotive datasets (e.g., KITTI-360, nuScenes, and Waymo). SSCBench follows an established setup and format in the community, facilitating the easy exploration of SSC methods in various street views. We benchmark models using monocular, trinocular, and point cloud input to assess the performance gap resulting from sensor coverage and modality. Moreover, we have unified semantic labels across diverse datasets to simplify cross-domain generalization testing. We commit to including more datasets and SSC models to drive further advancements in this field.

SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving

TL;DR

SSCBench addresses the lack of large-scale outdoor 3D semantic scene completion data by aggregating KITTI-360, nuScenes, and Waymo into a unified benchmark that supports monocular, trinocular, and LiDAR inputs with standardized cross-domain labels. It provides ~66,913 frames across three subsets and a consistent voxel-based evaluation framework to assess geometry and semantics within a forward-facing volume. The study benchmarks multiple camera- and LiDAR-based SSC methods, revealing how sensor density, depth estimation, and domain shifts affect performance and highlighting clear cross-domain generalization gaps. By enabling cross-dataset tests and unified labeling, SSCBench aims to drive robust, generalizable SSC solutions for real-world autonomous driving. The work also identifies limitations and points to temporal extensions as a key direction for future SSC benchmarks.

Abstract

Monocular scene understanding is a foundational component of autonomous systems. Within the spectrum of monocular perception topics, one crucial and useful task for holistic 3D scene understanding is semantic scene completion (SSC), which jointly completes semantic information and geometric details from RGB input. However, progress in SSC, particularly in large-scale street views, is hindered by the scarcity of high-quality datasets. To address this issue, we introduce SSCBench, a comprehensive benchmark that integrates scenes from widely used automotive datasets (e.g., KITTI-360, nuScenes, and Waymo). SSCBench follows an established setup and format in the community, facilitating the easy exploration of SSC methods in various street views. We benchmark models using monocular, trinocular, and point cloud input to assess the performance gap resulting from sensor coverage and modality. Moreover, we have unified semantic labels across diverse datasets to simplify cross-domain generalization testing. We commit to including more datasets and SSC models to drive further advancements in this field.
Paper Structure (13 sections, 3 figures, 4 tables)

This paper contains 13 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Visualizations of SSCBench derived from KITTI-360 liao2022kitti, nuScenes caesar2020nuscenes, and Waymo sun2020scalability. We showcase accurate SSC ground truth in a variety of street views.
  • Figure 2: Top row: dynamic objects synchronization. Two examples on nuScenes caesar2020nuscenes are shown. Spatio-temporal tubes are introduced without handling dynamic objects, damaging the accuracy of labels. Bottom row: unknown voxels exclusion. Voxels are marked as unknown (denoted by grey color) when they are occluded or remain unprobed by the LiDAR.
  • Figure 3: Statistical analysis. Top row: Label Distribution (LD.) of different datasets. Bottom row: scale comparisons between SSCBench and SemanticKITTI.