Table of Contents
Fetching ...

HD$^2$-SSC: High-Dimension High-Density Semantic Scene Completion for Autonomous Driving

Zhiwen Yang, Yuxin Peng

TL;DR

Camera-based semantic scene completion (SSC) suffers from an input-output dimension gap and an annotation-density gap when inferring dense 3D occupancy from 2D images. HD$^2$-SSC tackles this with High-dimension Semantic Decoupling (HSD) to expand and decouple coarse pixel semantics into high-dimensional voxelized representations and Semantic Aggregation to cluster and differentiate semantics via cross-attention and decoupling loss. It also introduces High-density Occupancy Refinement (HOR), a detect-and-refine pipeline that aligns geometric and semantic voxel distributions to fill missing voxels and correct errors, improving semantic density. On SemanticKITTI and SSCBench-KITTI-360, HD$^2$-SSC achieves state-of-the-art IoU and mIoU, validating the effectiveness of decoupled semantic expansion and distribution-aligned refinement for practical autonomous-driving perception.

Abstract

Camera-based 3D semantic scene completion (SSC) plays a crucial role in autonomous driving, enabling voxelized 3D scene understanding for effective scene perception and decision-making. Existing SSC methods have shown efficacy in improving 3D scene representations, but suffer from the inherent input-output dimension gap and annotation-reality density gap, where the 2D planner view from input images with sparse annotated labels leads to inferior prediction of real-world dense occupancy with a 3D stereoscopic view. In light of this, we propose the corresponding High-Dimension High-Density Semantic Scene Completion (HD$^2$-SSC) framework with expanded pixel semantics and refined voxel occupancies. To bridge the dimension gap, a High-dimension Semantic Decoupling module is designed to expand 2D image features along a pseudo third dimension, decoupling coarse pixel semantics from occlusions, and then identify focal regions with fine semantics to enrich image features. To mitigate the density gap, a High-density Occupancy Refinement module is devised with a "detect-and-refine" architecture to leverage contextual geometric and semantic structures for enhanced semantic density with the completion of missing voxels and correction of erroneous ones. Extensive experiments and analyses on the SemanticKITTI and SSCBench-KITTI-360 datasets validate the effectiveness of our HD$^2$-SSC framework.

HD$^2$-SSC: High-Dimension High-Density Semantic Scene Completion for Autonomous Driving

TL;DR

Camera-based semantic scene completion (SSC) suffers from an input-output dimension gap and an annotation-density gap when inferring dense 3D occupancy from 2D images. HD-SSC tackles this with High-dimension Semantic Decoupling (HSD) to expand and decouple coarse pixel semantics into high-dimensional voxelized representations and Semantic Aggregation to cluster and differentiate semantics via cross-attention and decoupling loss. It also introduces High-density Occupancy Refinement (HOR), a detect-and-refine pipeline that aligns geometric and semantic voxel distributions to fill missing voxels and correct errors, improving semantic density. On SemanticKITTI and SSCBench-KITTI-360, HD-SSC achieves state-of-the-art IoU and mIoU, validating the effectiveness of decoupled semantic expansion and distribution-aligned refinement for practical autonomous-driving perception.

Abstract

Camera-based 3D semantic scene completion (SSC) plays a crucial role in autonomous driving, enabling voxelized 3D scene understanding for effective scene perception and decision-making. Existing SSC methods have shown efficacy in improving 3D scene representations, but suffer from the inherent input-output dimension gap and annotation-reality density gap, where the 2D planner view from input images with sparse annotated labels leads to inferior prediction of real-world dense occupancy with a 3D stereoscopic view. In light of this, we propose the corresponding High-Dimension High-Density Semantic Scene Completion (HD-SSC) framework with expanded pixel semantics and refined voxel occupancies. To bridge the dimension gap, a High-dimension Semantic Decoupling module is designed to expand 2D image features along a pseudo third dimension, decoupling coarse pixel semantics from occlusions, and then identify focal regions with fine semantics to enrich image features. To mitigate the density gap, a High-density Occupancy Refinement module is devised with a "detect-and-refine" architecture to leverage contextual geometric and semantic structures for enhanced semantic density with the completion of missing voxels and correction of erroneous ones. Extensive experiments and analyses on the SemanticKITTI and SSCBench-KITTI-360 datasets validate the effectiveness of our HD-SSC framework.

Paper Structure

This paper contains 34 sections, 12 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Illustrations of (a) Dimension Gap, highlighting the disparity between input coarse pixel semantics with occlusion and output distinct fine voxel semantics, (b) Density Gap, depicting the difference between annotated sparse labels and ground-truth dense occupancies.
  • Figure 2: The overall architecture of our HD$^2$-SSC. The High-dimension Semantic Decoupling (HSD) module expands and decouples coarse pixel semantics with orthogonal loss, then aggregates high-dimension voxelized semantics via semantic clustering with decoupling loss. The High-density Occupancy Refinement (HOR) module adopts a "detect-and-refine" architecture to identify geometric and semantic critical voxels, whose overall distributions are aligned for consistent contextual details.
  • Figure 3: Illustration of aggregating the high-dimension voxelized semantics concerning the semantic clusters with decoupling loss.
  • Figure 4: Effect of the expanded dimension on the SSC performance, evaluated on the SemanticKITTI validation set.
  • Figure 5: Visualization results of SSC prediction on the SemanticKITTI validation set. We highlight the occupancy ground truth with blue boxes, false SSC predictions of the best comparison method SGN with red boxes, and the improved SSC predictions from our HD$^2$-SSC approach with green boxes. Better viewed when zoomed in.
  • ...and 1 more figures