SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World
Chen Chen, Zhirui Wang, Taowei Sheng, Yi Jiang, Yundu Li, Peirui Cheng, Luning Zhang, Kaiqiang Chen, Yanfeng Hu, Xue Yang, Xian Sun
TL;DR
This work tackles the limitations of street-view-only 3D occupancy prediction by integrating satellite imagery aligned via GPS/IMU poses. The authors introduce SA-Occ, featuring a Satellite BEV branch with 3D-Proj Guidance, a Street BEV branch with Uniform Sampling Alignment, and a Dynamic-Decoupling Fusion module to handle temporal asynchrony and dynamic objects. They also curate the Occ3D-NuScenes Extension Dataset to enable real-time satellite-street cross-view evaluation. Empirically, SA-Occ achieves a new state-of-the-art $mIoU$ of $39.05\%$ for single-frame input with only a small latency increase ($6.93$ ms), demonstrating the practical value of satellite-aware cross-view perception for autonomous driving.
Abstract
Existing vision-based 3D occupancy prediction methods are inherently limited in accuracy due to their exclusive reliance on street-view imagery, neglecting the potential benefits of incorporating satellite views. We propose SA-Occ, the first Satellite-Assisted 3D occupancy prediction model, which leverages GPS & IMU to integrate historical yet readily available satellite imagery into real-time applications, effectively mitigating limitations of ego-vehicle perceptions, involving occlusions and degraded performance in distant regions. To address the core challenges of cross-view perception, we propose: 1) Dynamic-Decoupling Fusion, which resolves inconsistencies in dynamic regions caused by the temporal asynchrony between satellite and street views; 2) 3D-Proj Guidance, a module that enhances 3D feature extraction from inherently 2D satellite imagery; and 3) Uniform Sampling Alignment, which aligns the sampling density between street and satellite views. Evaluated on Occ3D-nuScenes, SA-Occ achieves state-of-the-art performance, especially among single-frame methods, with a 39.05% mIoU (a 6.97% improvement), while incurring only 6.93 ms of additional latency per frame. Our code and newly curated dataset are available at https://github.com/chenchen235/SA-Occ.
