Table of Contents
Fetching ...

FSFSplatter: Build Surface and Novel Views with Sparse-Views within 2min

Yibin Zhao, Yihan Pan, Jun Nan, Liwei Chen, Jianjun Yi

TL;DR

The paper tackles fast, accurate surface reconstruction and novel-view synthesis from free-sparse RGB inputs. It introduces FSFSplatter, a Transformer-based pipeline that performs end-to-end dense Gaussian initialization and differentiable camera-parameter estimation, followed by geometry-enhanced scene optimization with monocular-depth and multi-view feature supervision. Key contributions include the self-splitting Gaussian head for dense Gaussian densification, contribution-based pruning to remove floaters, and differentiable camera poses integrated into rasterization, yielding state-of-the-art results on DTU, Replica, and BlendedMVS for both surface reconstruction and NVS under sparse views. The approach significantly reduces overfitting to sparse views, speeds up optimization, and remains robust across object- and scene-level datasets, demonstrating practical impact for rapid 3D reconstruction in real-world, unconstrained capture scenarios.

Abstract

Gaussian Splatting has become a leading reconstruction technique, known for its high-quality novel view synthesis and detailed reconstruction. However, most existing methods require dense, calibrated views. Reconstructing from free sparse images often leads to poor surface due to limited overlap and overfitting. We introduce FSFSplatter, a new approach for fast surface reconstruction from free sparse images. Our method integrates end-to-end dense Gaussian initialization, camera parameter estimation, and geometry-enhanced scene optimization. Specifically, FSFSplatter employs a large Transformer to encode multi-view images and generates a dense and geometrically consistent Gaussian scene initialization via a self-splitting Gaussian head. It eliminates local floaters through contribution-based pruning and mitigates overfitting to limited views by leveraging depth and multi-view feature supervision with differentiable camera parameters during rapid optimization. FSFSplatter outperforms current state-of-the-art methods on widely used DTU, Replica, and BlendedMVS datasets.

FSFSplatter: Build Surface and Novel Views with Sparse-Views within 2min

TL;DR

The paper tackles fast, accurate surface reconstruction and novel-view synthesis from free-sparse RGB inputs. It introduces FSFSplatter, a Transformer-based pipeline that performs end-to-end dense Gaussian initialization and differentiable camera-parameter estimation, followed by geometry-enhanced scene optimization with monocular-depth and multi-view feature supervision. Key contributions include the self-splitting Gaussian head for dense Gaussian densification, contribution-based pruning to remove floaters, and differentiable camera poses integrated into rasterization, yielding state-of-the-art results on DTU, Replica, and BlendedMVS for both surface reconstruction and NVS under sparse views. The approach significantly reduces overfitting to sparse views, speeds up optimization, and remains robust across object- and scene-level datasets, demonstrating practical impact for rapid 3D reconstruction in real-world, unconstrained capture scenarios.

Abstract

Gaussian Splatting has become a leading reconstruction technique, known for its high-quality novel view synthesis and detailed reconstruction. However, most existing methods require dense, calibrated views. Reconstructing from free sparse images often leads to poor surface due to limited overlap and overfitting. We introduce FSFSplatter, a new approach for fast surface reconstruction from free sparse images. Our method integrates end-to-end dense Gaussian initialization, camera parameter estimation, and geometry-enhanced scene optimization. Specifically, FSFSplatter employs a large Transformer to encode multi-view images and generates a dense and geometrically consistent Gaussian scene initialization via a self-splitting Gaussian head. It eliminates local floaters through contribution-based pruning and mitigates overfitting to limited views by leveraging depth and multi-view feature supervision with differentiable camera parameters during rapid optimization. FSFSplatter outperforms current state-of-the-art methods on widely used DTU, Replica, and BlendedMVS datasets.

Paper Structure

This paper contains 36 sections, 11 equations, 8 figures, 11 tables.

Figures (8)

  • Figure 1: We address the inherent ambiguity and local view overfitting in the reconstruction of free-sparse images, proposing a novel pipeline named FSFSplatter that enables surface reconstruction and novel view synthesis within 2 minutes. FSFSplatter significantly reduces the number of iterations required while decreasing surface reconstruction error by at least 28.39% and novel view synthesis error by at least 46.19%. The figure illustrates the scene "scan63" from DTU datasets.
  • Figure 2: Overview of FSFSplatter. We employ a large alternating attention network as the backbone and generate high-dimensional semi-dense Gaussian scenes through independent heads. These are subsequently mapped into a dense Gaussian scene via a self-splitting densify MLP, forming an end-to-end framework for dense Gaussian initialization and camera parameter estimation. Through differentiable Gaussian scene construction from initialization and joint supervision of depth, multi-view stereo features, and RGBs during the optimization process, scene optimization achieves geometric enhancement, effectively mitigating error surfaces caused by overfitting to free-sparse views.
  • Figure 3: The end-to-end dense Gaussian initialization and camera parameter estimation take sparse images as input. Image tokens are processed via a DINO Encoder and alternating attention mechanisms. A Camera Head and a DPT Head perform back-projection and depth estimation, yielding a semi-dense Gaussian scene. This scene is partitioned into multiple patches, which are then fed into a densification MLP to produce a dense Gaussian scene serving as the initialization.
  • Figure 4: Explanation of dense Gaussian initialization. By incorporating explicit normal estimation and an MLP-based implicit Gaussian densification process, the quality of both novel NVS and surface reconstruction is significantly improved.
  • Figure 5: After initialization, a contribution-based Gaussian pruning process is first applied to eliminate erroneous floaters present in the dense Gaussians. Both the Gaussian primitives and the camera poses are treated as differentiable variables. Multi-view RGB, depth, and normal maps are rendered through rasterization. Various loss functions are constructed based on monocular depth estimation and multi-view high-dimensional feature extraction, with subsequent backpropagation.
  • ...and 3 more figures