MVS-GS: High-Quality 3D Gaussian Splatting Mapping via Online Multi-View Stereo
Byeonggwon Lee, Junkyu Park, Khang Truong Giang, Sungho Jo, Soohwan Song
TL;DR
This paper tackles online, high-fidelity 3D model generation for neural rendering from RGB streams, addressing depth ambiguities that degrade renderings. It introduces a two-branch pipeline: a frontend that performs camera tracking and online MVS depth estimation with MVSFormer in a local time window, and a backend that densifies and optimizes 3D Gaussian splats (GES) in parallel, using filtered depths from sequential views to initialize Gaussian points. The approach includes a depth refinement step via V-Fuse and a PSNR-based detection of unexplored regions to efficiently densify the scene with adaptive density control through differentiable rendering. Experiments on indoor (Replica, TUM-RGBD) and outdoor ( Tanks and Temples, aerial) datasets show the method outperforms state-of-the-art dense SLAM methods, offering robust outdoor performance and detailed reconstructions.
Abstract
This study addresses the challenge of online 3D model generation for neural rendering using an RGB image stream. Previous research has tackled this issue by incorporating Neural Radiance Fields (NeRF) or 3D Gaussian Splatting (3DGS) as scene representations within dense SLAM methods. However, most studies focus primarily on estimating coarse 3D scenes rather than achieving detailed reconstructions. Moreover, depth estimation based solely on images is often ambiguous, resulting in low-quality 3D models that lead to inaccurate renderings. To overcome these limitations, we propose a novel framework for high-quality 3DGS modeling that leverages an online multi-view stereo (MVS) approach. Our method estimates MVS depth using sequential frames from a local time window and applies comprehensive depth refinement techniques to filter out outliers, enabling accurate initialization of Gaussians in 3DGS. Furthermore, we introduce a parallelized backend module that optimizes the 3DGS model efficiently, ensuring timely updates with each new keyframe. Experimental results demonstrate that our method outperforms state-of-the-art dense SLAM methods, particularly excelling in challenging outdoor environments.
