FreeSplat++: Generalizable 3D Gaussian Splatting for Efficient Indoor Scene Reconstruction
Yunsong Wang, Tianxin Huang, Hanlin Chen, Gim Hee Lee
TL;DR
FreeSplat++ addresses indoor whole-scene reconstruction with a generalizable 3D Gaussian Splatting (3DGS) framework by introducing a low-cost cross-view aggregation pipeline, Pixel-wise Triplet Fusion to reduce Gaussian redundancy, and a Weighted Floater Removal strategy to suppress floaters. A depth-regularized per-scene fine-tuning stage further enhances rendering quality while preserving geometric accuracy. The approach yields substantial improvements over prior generalizable 3DGS methods in both region and whole-scene tasks, with fewer Gaussians and shorter training times, particularly when handling long input sequences. These innovations collectively enable efficient, accurate explicit 3D representations for large-scale indoor scenes and offer a practical alternative to per-scene optimization in many contexts.
Abstract
Recently, the integration of the efficient feed-forward scheme into 3D Gaussian Splatting (3DGS) has been actively explored. However, most existing methods focus on sparse view reconstruction of small regions and cannot produce eligible whole-scene reconstruction results in terms of either quality or efficiency. In this paper, we propose FreeSplat++, which focuses on extending the generalizable 3DGS to become an alternative approach to large-scale indoor whole-scene reconstruction, which has the potential of significantly accelerating the reconstruction speed and improving the geometric accuracy. To facilitate whole-scene reconstruction, we initially propose the Low-cost Cross-View Aggregation framework to efficiently process extremely long input sequences. Subsequently, we introduce a carefully designed pixel-wise triplet fusion method to incrementally aggregate the overlapping 3D Gaussian primitives from multiple views, adaptively reducing their redundancy. Furthermore, we propose a weighted floater removal strategy that can effectively reduce floaters, which serves as an explicit depth fusion approach that is crucial in whole-scene reconstruction. After the feed-forward reconstruction of 3DGS primitives, we investigate a depth-regularized per-scene fine-tuning process. Leveraging the dense, multi-view consistent depth maps obtained during the feed-forward prediction phase for an extra constraint, we refine the entire scene's 3DGS primitive to enhance rendering quality while preserving geometric accuracy. Extensive experiments confirm that our FreeSplat++ significantly outperforms existing generalizable 3DGS methods, especially in whole-scene reconstructions. Compared to conventional per-scene optimized 3DGS approaches, our method with depth-regularized per-scene fine-tuning demonstrates substantial improvements in reconstruction accuracy and a notable reduction in training time.
