Table of Contents
Fetching ...

StreetSurfGS: Scalable Urban Street Surface Reconstruction with Planar-based Gaussian Splatting

Xiao Cui, Weicai Ye, Yifan Wang, Guofeng Zhang, Wengang Zhou, Houqiang Li

TL;DR

This work introduces StreetSurfGS, the first method to employ Gaussian Splatting specifically tailored for scalable urban street scene surface reconstruction, and utilizes a planar-based octree representation and segmented training to reduce memory costs, accommodate unique camera characteristics, and improve scalability.

Abstract

Reconstructing urban street scenes is crucial due to its vital role in applications such as autonomous driving and urban planning. These scenes are characterized by long and narrow camera trajectories, occlusion, complex object relationships, and data sparsity across multiple scales. Despite recent advancements, existing surface reconstruction methods, which are primarily designed for object-centric scenarios, struggle to adapt effectively to the unique characteristics of street scenes. To address this challenge, we introduce StreetSurfGS, the first method to employ Gaussian Splatting specifically tailored for scalable urban street scene surface reconstruction. StreetSurfGS utilizes a planar-based octree representation and segmented training to reduce memory costs, accommodate unique camera characteristics, and ensure scalability. Additionally, to mitigate depth inaccuracies caused by object overlap, we propose a guided smoothing strategy within regularization to eliminate inaccurate boundary points and outliers. Furthermore, to address sparse views and multi-scale challenges, we use a dual-step matching strategy that leverages adjacent and long-term information. Extensive experiments validate the efficacy of StreetSurfGS in both novel view synthesis and surface reconstruction.

StreetSurfGS: Scalable Urban Street Surface Reconstruction with Planar-based Gaussian Splatting

TL;DR

This work introduces StreetSurfGS, the first method to employ Gaussian Splatting specifically tailored for scalable urban street scene surface reconstruction, and utilizes a planar-based octree representation and segmented training to reduce memory costs, accommodate unique camera characteristics, and improve scalability.

Abstract

Reconstructing urban street scenes is crucial due to its vital role in applications such as autonomous driving and urban planning. These scenes are characterized by long and narrow camera trajectories, occlusion, complex object relationships, and data sparsity across multiple scales. Despite recent advancements, existing surface reconstruction methods, which are primarily designed for object-centric scenarios, struggle to adapt effectively to the unique characteristics of street scenes. To address this challenge, we introduce StreetSurfGS, the first method to employ Gaussian Splatting specifically tailored for scalable urban street scene surface reconstruction. StreetSurfGS utilizes a planar-based octree representation and segmented training to reduce memory costs, accommodate unique camera characteristics, and ensure scalability. Additionally, to mitigate depth inaccuracies caused by object overlap, we propose a guided smoothing strategy within regularization to eliminate inaccurate boundary points and outliers. Furthermore, to address sparse views and multi-scale challenges, we use a dual-step matching strategy that leverages adjacent and long-term information. Extensive experiments validate the efficacy of StreetSurfGS in both novel view synthesis and surface reconstruction.
Paper Structure (21 sections, 7 equations, 15 figures, 8 tables)

This paper contains 21 sections, 7 equations, 15 figures, 8 tables.

Figures (15)

  • Figure 1: We present StreetSurfGS, a framework for scalable street scenes surface reconstruction from RGB images using Gaussian Splatting, without the need of lidar or pretrained geometry estimation models for supervision. Shown in the figure are untextured meshes, normals and textured meshes.
  • Figure 2: Illustration of our edge filtering strategy. We use SAM to generate a mask, followed by the extraction of width-defined boundaries for removal within the smoothness constraint.
  • Figure 3: Illustration of our far-field matching strategy. We enforce long-term constraints by considering the consensus regions of distant frames to reduce accumulated error.
  • Figure 4: Overall Qualitative comparison on the KITTI-360 liao2022kitti. Our filtering strategy enables accurate depth shifts, while our matching strategy effectively models objects at various distances. Additionally, our method constructs flat roads and generates mesh of significantly higher quality than previous methods.
  • Figure 5: Visual comparison for novel view synthesis on the KITTI-360 liao2022kitti. Previous approaches predominantly depend on immediate frame data, which often results in blurred object peripheries. In contrast, our method extends the data scope by leveraging adjacent frame information coupled with long-term spatial constraints, which sharply reduces such blurring effects.
  • ...and 10 more figures