Towards Temporal Fusion Beyond the Field of View for Camera-based Semantic Scene Completion

Jongseong Bae; Junwoo Ha; Jinnyeong Heo; Yeongin Lee; Ha Young Kim

Towards Temporal Fusion Beyond the Field of View for Camera-based Semantic Scene Completion

Jongseong Bae, Junwoo Ha, Jinnyeong Heo, Yeongin Lee, Ha Young Kim

TL;DR

This work tackles the challenge of completing 3D scenes beyond the current camera view in camera-based semantic scene completion. It introduces C3DFusion, a temporal geometry fusion module that directly aligns and fuses 3D lifted point features from current and past frames in the current frame's metric space, complemented by historical context blurring and current-centric feature densification to reduce noise and emphasize current information. The approach yields state-of-the-art results on SemanticKITTI and SSCBench-KITTI-360 and generalizes well to other SSC architectures, with notable improvements in out-of-view regions. By enabling robust out-of-frame completion with an efficient, generalizable design, C3DFusion has strong potential to enhance perception reliability in autonomous driving and related 3D perception tasks.

Abstract

Recent camera-based 3D semantic scene completion (SSC) methods have increasingly explored leveraging temporal cues to enrich the features of the current frame. However, while these approaches primarily focus on enhancing in-frame regions, they often struggle to reconstruct critical out-of-frame areas near the sides of the ego-vehicle, although previous frames commonly contain valuable contextual information about these unseen regions. To address this limitation, we propose the Current-Centric Contextual 3D Fusion (C3DFusion) module, which generates hidden region-aware 3D feature geometry by explicitly aligning 3D-lifted point features from both current and historical frames. C3DFusion performs enhanced temporal fusion through two complementary techniques-historical context blurring and current-centric feature densification-which suppress noise from inaccurately warped historical point features by attenuating their scale, and enhance current point features by increasing their volumetric contribution. Simply integrated into standard SSC architectures, C3DFusion demonstrates strong effectiveness, significantly outperforming state-of-the-art methods on the SemanticKITTI and SSCBench-KITTI-360 datasets. Furthermore, it exhibits robust generalization, achieving notable performance gains when applied to other baseline models.

Towards Temporal Fusion Beyond the Field of View for Camera-based Semantic Scene Completion

TL;DR

Abstract

Towards Temporal Fusion Beyond the Field of View for Camera-based Semantic Scene Completion

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)