Stereo-GS: Multi-View Stereo Vision Model for Generalizable 3D Gaussian Splatting Reconstruction
Xiufeng Huang, Ka Chun Cheung, Runmin Cong, Simon See, Renjie Wan
TL;DR
Stereo-GS tackles the high resource demands of generalizable 3D Gaussian Splatting by disentangling geometry and appearance within a multi-view stereo framework. It leverages a diffusion-generated set of multi-view images and a stereo vision backbone to produce dense multi-view feature tokens, which are fused with global attention to predict geometry as point-maps and appearance as Gaussian features, forming GS-maps. A two-stage training scheme and a refinement network reduce reliance on data priors, enabling pose-free, robust 3DGS reconstruction with improved training and inference efficiency. Experiments across multi-view and single-image-to-3D tasks show state-of-the-art quality with practical resource usage, and the approach demonstrates applicability to real-world scenes and faster turnaround times.
Abstract
Generalizable 3D Gaussian Splatting reconstruction showcases advanced Image-to-3D content creation but requires substantial computational resources and large datasets, posing challenges to training models from scratch. Current methods usually entangle the prediction of 3D Gaussian geometry and appearance, which rely heavily on data-driven priors and result in slow regression speeds. To address this, we propose \method, a disentangled framework for efficient 3D Gaussian prediction. Our method extracts features from local image pairs using a stereo vision backbone and fuses them via global attention blocks. Dedicated point and Gaussian prediction heads generate multi-view point-maps for geometry and Gaussian features for appearance, combined as GS-maps to represent the 3DGS object. A refinement network enhances these GS-maps for high-quality reconstruction. Unlike existing methods that depend on camera parameters, our approach achieves pose-free 3D reconstruction, improving robustness and practicality. By reducing resource demands while maintaining high-quality outputs, \method provides an efficient, scalable solution for real-world 3D content generation.
