SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion

Xiyue Guo; Jiarui Hu; Junjie Hu; Hujun Bao; Guofeng Zhang

SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion

Xiyue Guo, Jiarui Hu, Junjie Hu, Hujun Bao, Guofeng Zhang

TL;DR

The paper tackles occlusion-limited 3D SSC by introducing SGFormer, a satellite-ground cooperative framework that encodes ground and satellite images in parallel branches and unifies them in a common domain. It introduces a ground-view guided satellite correction strategy, a deformable self-/cross-attention-based feature transformation, and an adaptive fusion module to balance contributions from both views, enabling robust occupancy and semantic predictions. Through experiments on SemanticKITTI and SSCBench-KITTI-360, SGFormer achieves state-of-the-art performance among camera-based methods and competitive results with LiDAR-based approaches, validating the value of integrating satellite imagery for global scene context and dynamic detail. The work demonstrates that low-cost satellite data can significantly alleviate occlusion-induced ambiguities and improve SSC, with practical implications for autonomous driving and remote sensing applications, backed by publicly available code.

Abstract

Recently, camera-based solutions have been extensively explored for scene semantic completion (SSC). Despite their success in visible areas, existing methods struggle to capture complete scene semantics due to frequent visual occlusions. To address this limitation, this paper presents the first satellite-ground cooperative SSC framework, i.e., SGFormer, exploring the potential of satellite-ground image pairs in the SSC task. Specifically, we propose a dual-branch architecture that encodes orthogonal satellite and ground views in parallel, unifying them into a common domain. Additionally, we design a ground-view guidance strategy that corrects satellite image biases during feature encoding, addressing misalignment between satellite and ground views. Moreover, we develop an adaptive weighting strategy that balances contributions from satellite and ground views. Experiments demonstrate that SGFormer outperforms the state of the art on SemanticKITTI and SSCBench-KITTI-360 datasets. Our code is available on https://github.com/gxytcrc/SGFormer.

SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion

TL;DR

Abstract

SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)