Table of Contents
Fetching ...

Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis

Zhiyuan Min, Yawei Luo, Jianwen Sun, Yi Yang

TL;DR

eFreeSplat is proposed, an efficient feed-forward 3DGS-based model for generalizable novel view synthesis that operates independently of epipolar line constraints that surpasses state-of-the-art baselines that rely on epipolar priors.

Abstract

Generalizable 3D Gaussian splitting (3DGS) can reconstruct new scenes from sparse-view observations in a feed-forward inference manner, eliminating the need for scene-specific retraining required in conventional 3DGS. However, existing methods rely heavily on epipolar priors, which can be unreliable in complex realworld scenes, particularly in non-overlapping and occluded regions. In this paper, we propose eFreeSplat, an efficient feed-forward 3DGS-based model for generalizable novel view synthesis that operates independently of epipolar line constraints. To enhance multiview feature extraction with 3D perception, we employ a selfsupervised Vision Transformer (ViT) with cross-view completion pre-training on large-scale datasets. Additionally, we introduce an Iterative Cross-view Gaussians Alignment method to ensure consistent depth scales across different views. Our eFreeSplat represents an innovative approach for generalizable novel view synthesis. Different from the existing pure geometry-free methods, eFreeSplat focuses more on achieving epipolar-free feature matching and encoding by providing 3D priors through cross-view pretraining. We evaluate eFreeSplat on wide-baseline novel view synthesis tasks using the RealEstate10K and ACID datasets. Extensive experiments demonstrate that eFreeSplat surpasses state-of-the-art baselines that rely on epipolar priors, achieving superior geometry reconstruction and novel view synthesis quality. Project page: https://tatakai1.github.io/efreesplat/.

Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis

TL;DR

eFreeSplat is proposed, an efficient feed-forward 3DGS-based model for generalizable novel view synthesis that operates independently of epipolar line constraints that surpasses state-of-the-art baselines that rely on epipolar priors.

Abstract

Generalizable 3D Gaussian splitting (3DGS) can reconstruct new scenes from sparse-view observations in a feed-forward inference manner, eliminating the need for scene-specific retraining required in conventional 3DGS. However, existing methods rely heavily on epipolar priors, which can be unreliable in complex realworld scenes, particularly in non-overlapping and occluded regions. In this paper, we propose eFreeSplat, an efficient feed-forward 3DGS-based model for generalizable novel view synthesis that operates independently of epipolar line constraints. To enhance multiview feature extraction with 3D perception, we employ a selfsupervised Vision Transformer (ViT) with cross-view completion pre-training on large-scale datasets. Additionally, we introduce an Iterative Cross-view Gaussians Alignment method to ensure consistent depth scales across different views. Our eFreeSplat represents an innovative approach for generalizable novel view synthesis. Different from the existing pure geometry-free methods, eFreeSplat focuses more on achieving epipolar-free feature matching and encoding by providing 3D priors through cross-view pretraining. We evaluate eFreeSplat on wide-baseline novel view synthesis tasks using the RealEstate10K and ACID datasets. Extensive experiments demonstrate that eFreeSplat surpasses state-of-the-art baselines that rely on epipolar priors, achieving superior geometry reconstruction and novel view synthesis quality. Project page: https://tatakai1.github.io/efreesplat/.

Paper Structure

This paper contains 16 sections, 9 equations, 55 figures, 5 tables.

Figures (55)

  • Figure 1: Epipolar priors can be unreliable across extremely sparse views, especially in non-overlapping or occluded areas. Our model, eFreeSplat, generalizes to novel scenes without relying on epipolar priors, offering superior appearance and geometric perception.
  • Figure 2: Overview of eFreeSplat. (a) Epipolar-free Cross-view Mutual Perception leverages self-supervised cross-view completion pre-training weinzaepfel2023croco to extract robust 3D priors. The ViT dosovitskiy2020image with shared weights processes the reference images, followed by a cross-attention decoder to generate multiview feature maps, forming 3D perception without epipolar priors. (b) Iterative Cross-view Gaussians Alignment module iteratively refines Gaussian attributes through a 2D U-Net. The process involves warped features to align corresponding features and depths, ensuring consistent depth scales across different views. (c) The final step involves employing rasterization-based volume rendering kerbl20233d to generate high-quality geometry and realistic novel view images.
  • Figure 5: Our method reconstructs more reliable results than MVSplat when the reference views overlap is low. In the histogram, the blue bars represent the frequency at which our method exceeds MVSplat in rendering quality under the current overlap conditions, while the orange bars indicate the opposite.
  • Figure : Ref.
  • Figure : Ref.
  • ...and 50 more figures