MuGS: Multi-Baseline Generalizable Gaussian Splatting Reconstruction
Yaopeng Lou, Liao Shen, Tianqi Liu, Jiaqi Li, Zihao Huang, Huiqiang Sun, Zhiguo Cao
TL;DR
MuGS tackles the challenge of generalizable novel view synthesis across varying input baselines by fusing multi-view stereo and monocular depth cues within a 3D Gaussian splatting framework. A projection-and-sampling depth fusion refines the depth volume, guided by a learned consistency cue and enhanced by a lightweight attention mechanism, while a reference-view loss provides contextual supervision. The method achieves state-of-the-art or competitive results across small- and large-baseline datasets and shows promising zero-shot performance, demonstrating practical impact for versatile 3D reconstruction and rendering without per-scene optimization. The approach extends fast rendering and robust geometry reconstruction by leveraging both robust MDE features and precise MVS depth guidance, with ablations confirming the contributions of depth refinement, feature augmentation, and reference supervision.
Abstract
We present Multi-Baseline Gaussian Splatting (MuGS), a generalized feed-forward approach for novel view synthesis that effectively handles diverse baseline settings, including sparse input views with both small and large baselines. Specifically, we integrate features from Multi-View Stereo (MVS) and Monocular Depth Estimation (MDE) to enhance feature representations for generalizable reconstruction. Next, We propose a projection-and-sampling mechanism for deep depth fusion, which constructs a fine probability volume to guide the regression of the feature map. Furthermore, We introduce a reference-view loss to improve geometry and optimization efficiency. We leverage 3D Gaussian representations to accelerate training and inference time while enhancing rendering quality. MuGS achieves state-of-the-art performance across multiple baseline settings and diverse scenarios ranging from simple objects (DTU) to complex indoor and outdoor scenes (RealEstate10K). We also demonstrate promising zero-shot performance on the LLFF and Mip-NeRF 360 datasets. Code is available at https://github.com/EuclidLou/MuGS.
