SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation
Junyan Ye, Qiyan Luo, Jinhua Yu, Huaping Zhong, Zhimeng Zheng, Conghui He, Weijia Li
TL;DR
This work tackles cross-view semantic segmentation of fine-grained building attributes by fusing satellite and street-view data through a BEV-inspired mapping. The proposed SG-BEV framework introduces a Satellite-Guided Reprojection (SGR) module to overcome uneven BEV feature distribution and to continuously map street-view facade details into a top-down satellite space, complemented by a learnable cross-view fusion mechanism. Empirical results across four city datasets show significant improvements over both satellite-only and existing cross-view methods, validating the method's effectiveness in capturing interior building attributes like land use and floor count. The approach offers robust, multi-perspective building understanding with practical implications for urban planning and monitoring. $\Delta$-style equations and depth-guided reprojection play key roles in aligning features across views and ensuring dense interior reconstruction of facades.
Abstract
This paper aims at achieving fine-grained building attribute segmentation in a cross-view scenario, i.e., using satellite and street-view image pairs. The main challenge lies in overcoming the significant perspective differences between street views and satellite views. In this work, we introduce SG-BEV, a novel approach for satellite-guided BEV fusion for cross-view semantic segmentation. To overcome the limitations of existing cross-view projection methods in capturing the complete building facade features, we innovatively incorporate Bird's Eye View (BEV) method to establish a spatially explicit mapping of street-view features. Moreover, we fully leverage the advantages of multiple perspectives by introducing a novel satellite-guided reprojection module, optimizing the uneven feature distribution issues associated with traditional BEV methods. Our method demonstrates significant improvements on four cross-view datasets collected from multiple cities, including New York, San Francisco, and Boston. On average across these datasets, our method achieves an increase in mIOU by 10.13% and 5.21% compared with the state-of-the-art satellite-based and cross-view methods. The code and datasets of this work will be released at https://github.com/yejy53/SG-BEV.
