S3R-GS: Streamlining the Pipeline for Large-Scale Street Scene Reconstruction
Guangting Zheng, Jiajun Deng, Xiaomeng Chu, Yu Yuan, Houqiang Li, Yanyong Zhang
TL;DR
This work tackles the scalability bottlenecks of large-scale street scene reconstruction with 3D Gaussian Splatting by introducing S3R-GS, a streamlined pipeline that eliminates unnecessary local-to-global transforms via instance-specific projection, reduces 3D-to-2D projections with temporal visibility, and renders distant content efficiently through adaptive LOD. It further enhances practicality by using BEV-semantic initialization and 2D box-based NeuralODE motion modeling to handle in-the-wild scenarios without 3D bounding boxes. The approach yields state-of-the-art rendering quality and substantial speedups across Argoverse 2, KITTI, and nuScenes datasets, demonstrating strong scalability and applicability to real-world driving scenes. Overall, S3R-GS provides an practical, high-performance framework for dynamic street scene reconstruction with reduced annotation burden and improved robustness.
Abstract
Recently, 3D Gaussian Splatting (3DGS) has reshaped the field of photorealistic 3D reconstruction, achieving impressive rendering quality and speed. However, when applied to large-scale street scenes, existing methods suffer from rapidly escalating per-viewpoint reconstruction costs as scene size increases, leading to significant computational overhead. After revisiting the conventional pipeline, we identify three key factors accounting for this issue: unnecessary local-to-global transformations, excessive 3D-to-2D projections, and inefficient rendering of distant content. To address these challenges, we propose S3R-GS, a 3DGS framework that Streamlines the pipeline for large-scale Street Scene Reconstruction, effectively mitigating these limitations. Moreover, most existing street 3DGS methods rely on ground-truth 3D bounding boxes to separate dynamic and static components, but 3D bounding boxes are difficult to obtain, limiting real-world applicability. To address this, we propose an alternative solution with 2D boxes, which are easier to annotate or can be predicted by off-the-shelf vision foundation models. Such designs together make S3R-GS readily adapt to large, in-the-wild scenarios. Extensive experiments demonstrate that S3R-GS enhances rendering quality and significantly accelerates reconstruction. Remarkably, when applied to videos from the challenging Argoverse2 dataset, it achieves state-of-the-art PSNR and SSIM, reducing reconstruction time to below 50%--and even 20%--of competing methods.
