Three Cars Approaching within 100m! Enhancing Distant Geometry by Tri-Axis Voxel Scanning for Camera-based Semantic Scene Completion

Jongseong Bae; Junwoo Ha; Ha Young Kim

Three Cars Approaching within 100m! Enhancing Distant Geometry by Tri-Axis Voxel Scanning for Camera-based Semantic Scene Completion

Jongseong Bae, Junwoo Ha, Ha Young Kim

TL;DR

Camera-based SSC often underestimates distant geometry due to perspective and occlusion. The authors propose ScanSSC, which refines distant voxels by propagating near-viewpoint context through an axis-wise masked self-attention Scan Module and a Scan Loss that uses cumulatively averaged logits to transfer information along depth, width, and height axes. The approach yields state-of-the-art results on SemanticKITTI and SSCBench-KITTI-360, with clear ablation-supported gains from both components and their synergy. This work addresses distance-dependent completion imbalance and offers a practical, robust mechanism for improving 3D scene completion in autonomous driving contexts. Overall, ScanSSC demonstrates that targeted near-to-far refinements can substantially boost distant geometry reconstruction while maintaining efficiency.

Abstract

Camera-based Semantic Scene Completion (SSC) is gaining attentions in the 3D perception field. However, properties such as perspective and occlusion lead to the underestimation of the geometry in distant regions, posing a critical issue for safety-focused autonomous driving systems. To tackle this, we propose ScanSSC, a novel camera-based SSC model composed of a Scan Module and Scan Loss, both designed to enhance distant scenes by leveraging context from near-viewpoint scenes. The Scan Module uses axis-wise masked attention, where each axis employing a near-to-far cascade masking that enables distant voxels to capture relationships with preceding voxels. In addition, the Scan Loss computes the cross-entropy along each axis between cumulative logits and corresponding class distributions in a near-to-far direction, thereby propagating rich context-aware signals to distant voxels. Leveraging the synergy between these components, ScanSSC achieves state-of-the-art performance, with IoUs of 44.54 and 48.29, and mIoUs of 17.40 and 20.14 on the SemanticKITTI and SSCBench-KITTI-360 benchmarks.

Three Cars Approaching within 100m! Enhancing Distant Geometry by Tri-Axis Voxel Scanning for Camera-based Semantic Scene Completion

TL;DR

Abstract

Three Cars Approaching within 100m! Enhancing Distant Geometry by Tri-Axis Voxel Scanning for Camera-based Semantic Scene Completion

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)