Table of Contents
Fetching ...

SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting

Gyeongjin Kang, Jisang Yoo, Jihyeon Park, Seungtae Nam, Hyeonsoo Im, Sangheon Shin, Sangpil Kim, Eunbyung Park

TL;DR

SelfSplat tackles the challenge of pose-free, 3D prior-free reconstruction from unposed multi-view videos by marrying explicit 3D Gaussian Splatting with self-supervised depth and pose estimation. It introduces a matching-aware pose network and a depth refinement module to achieve cross-view geometric consistency without per-scene finetuning, and it leverages pixel-aligned Gaussians for fast, differentiable rendering. Across RealEstate10K, ACID, and DL3DV, SelfSplat delivers superior appearance and geometry quality and demonstrates strong cross-dataset generalization, validated by extensive ablations. This work advances scalable, pose-free 3D scene understanding with practical, rasterization-based rendering for real-world applications.

Abstract

We propose SelfSplat, a novel 3D Gaussian Splatting model designed to perform pose-free and 3D prior-free generalizable 3D reconstruction from unposed multi-view images. These settings are inherently ill-posed due to the lack of ground-truth data, learned geometric information, and the need to achieve accurate 3D reconstruction without finetuning, making it difficult for conventional methods to achieve high-quality results. Our model addresses these challenges by effectively integrating explicit 3D representations with self-supervised depth and pose estimation techniques, resulting in reciprocal improvements in both pose accuracy and 3D reconstruction quality. Furthermore, we incorporate a matching-aware pose estimation network and a depth refinement module to enhance geometry consistency across views, ensuring more accurate and stable 3D reconstructions. To present the performance of our method, we evaluated it on large-scale real-world datasets, including RealEstate10K, ACID, and DL3DV. SelfSplat achieves superior results over previous state-of-the-art methods in both appearance and geometry quality, also demonstrates strong cross-dataset generalization capabilities. Extensive ablation studies and analysis also validate the effectiveness of our proposed methods. Code and pretrained models are available at https://gynjn.github.io/selfsplat/

SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting

TL;DR

SelfSplat tackles the challenge of pose-free, 3D prior-free reconstruction from unposed multi-view videos by marrying explicit 3D Gaussian Splatting with self-supervised depth and pose estimation. It introduces a matching-aware pose network and a depth refinement module to achieve cross-view geometric consistency without per-scene finetuning, and it leverages pixel-aligned Gaussians for fast, differentiable rendering. Across RealEstate10K, ACID, and DL3DV, SelfSplat delivers superior appearance and geometry quality and demonstrates strong cross-dataset generalization, validated by extensive ablations. This work advances scalable, pose-free 3D scene understanding with practical, rasterization-based rendering for real-world applications.

Abstract

We propose SelfSplat, a novel 3D Gaussian Splatting model designed to perform pose-free and 3D prior-free generalizable 3D reconstruction from unposed multi-view images. These settings are inherently ill-posed due to the lack of ground-truth data, learned geometric information, and the need to achieve accurate 3D reconstruction without finetuning, making it difficult for conventional methods to achieve high-quality results. Our model addresses these challenges by effectively integrating explicit 3D representations with self-supervised depth and pose estimation techniques, resulting in reciprocal improvements in both pose accuracy and 3D reconstruction quality. Furthermore, we incorporate a matching-aware pose estimation network and a depth refinement module to enhance geometry consistency across views, ensuring more accurate and stable 3D reconstructions. To present the performance of our method, we evaluated it on large-scale real-world datasets, including RealEstate10K, ACID, and DL3DV. SelfSplat achieves superior results over previous state-of-the-art methods in both appearance and geometry quality, also demonstrates strong cross-dataset generalization capabilities. Extensive ablation studies and analysis also validate the effectiveness of our proposed methods. Code and pretrained models are available at https://gynjn.github.io/selfsplat/

Paper Structure

This paper contains 30 sections, 10 equations, 14 figures, 14 tables.

Figures (14)

  • Figure 1: Overview of SelfSplat. Given unposed multi-view images as input, we predict depth and Gaussian attributes from the images, as well as the relative camera poses between them. We unify a self-supervised depth estimation framework with explicit 3D representation achieving accurate scene reconstruction.
  • Figure 2: Matching-aware pose network (a) and depth refinement module (b). We leverage cross-view features from input images to achieve accurate camera pose estimation, and use these estimated poses to further refine the depth maps with spatial awareness.
  • Figure 3: Qualitative comparison of novel view synthesis on RE10k (top two rows) and ACID (bottom row) datasets.
  • Figure 4: Qualitative comparison of novel view synthesis on DL3DV dataset.
  • Figure 5: Epipolar lines visualization. We draw the lines from reference to target frame using relative camera pose.
  • ...and 9 more figures