Table of Contents
Fetching ...

Pose-free 3D Gaussian splatting via shape-ray estimation

Youngju Na, Taeyeon Kim, Jumin Lee, Kyu Beom Han, Woo Jae Kim, Sung-eui Yoon

TL;DR

SHARE tackles the challenge of pose-free, generalizable 3D Gaussian splatting by introducing a pose-aware canonical volume that fuses multi-view features without explicit 3D pose alignment. It jointly estimates relative camera rays and 3D Gaussians, and uses an anchor-based coarse-to-fine scheme to refine local geometry around coarse anchors. The two main components—Ray-guided Multi-view Fusion and Anchor-aligned Gaussian Prediction—enable robust, feed-forward pose-free novel view synthesis across diverse real-world datasets and demonstrate strong cross-dataset generalization. This approach offers an efficient, scalable alternative for pose-free 3D reconstruction in sparse-view scenarios, with practical implications for real-world rendering and scene understanding.

Abstract

While generalizable 3D Gaussian splatting enables efficient, high-quality rendering of unseen scenes, it heavily depends on precise camera poses for accurate geometry. In real-world scenarios, obtaining accurate poses is challenging, leading to noisy pose estimates and geometric misalignments. To address this, we introduce SHARE, a pose-free, feed-forward Gaussian splatting framework that overcomes these ambiguities by joint shape and camera rays estimation. Instead of relying on explicit 3D transformations, SHARE builds a pose-aware canonical volume representation that seamlessly integrates multi-view information, reducing misalignment caused by inaccurate pose estimates. Additionally, anchor-aligned Gaussian prediction enhances scene reconstruction by refining local geometry around coarse anchors, allowing for more precise Gaussian placement. Extensive experiments on diverse real-world datasets show that our method achieves robust performance in pose-free generalizable Gaussian splatting. Code is avilable at https://github.com/youngju-na/SHARE

Pose-free 3D Gaussian splatting via shape-ray estimation

TL;DR

SHARE tackles the challenge of pose-free, generalizable 3D Gaussian splatting by introducing a pose-aware canonical volume that fuses multi-view features without explicit 3D pose alignment. It jointly estimates relative camera rays and 3D Gaussians, and uses an anchor-based coarse-to-fine scheme to refine local geometry around coarse anchors. The two main components—Ray-guided Multi-view Fusion and Anchor-aligned Gaussian Prediction—enable robust, feed-forward pose-free novel view synthesis across diverse real-world datasets and demonstrate strong cross-dataset generalization. This approach offers an efficient, scalable alternative for pose-free 3D reconstruction in sparse-view scenarios, with practical implications for real-world rendering and scene understanding.

Abstract

While generalizable 3D Gaussian splatting enables efficient, high-quality rendering of unseen scenes, it heavily depends on precise camera poses for accurate geometry. In real-world scenarios, obtaining accurate poses is challenging, leading to noisy pose estimates and geometric misalignments. To address this, we introduce SHARE, a pose-free, feed-forward Gaussian splatting framework that overcomes these ambiguities by joint shape and camera rays estimation. Instead of relying on explicit 3D transformations, SHARE builds a pose-aware canonical volume representation that seamlessly integrates multi-view information, reducing misalignment caused by inaccurate pose estimates. Additionally, anchor-aligned Gaussian prediction enhances scene reconstruction by refining local geometry around coarse anchors, allowing for more precise Gaussian placement. Extensive experiments on diverse real-world datasets show that our method achieves robust performance in pose-free generalizable Gaussian splatting. Code is avilable at https://github.com/youngju-na/SHARE

Paper Structure

This paper contains 12 sections, 5 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: SHARE predicts geometry, appearance, and relative poses from sparse unposed images. (a) Compared to MVSplat mvsplat, SHARE is more robust to pose noise, producing more accurate geometry and rendering. (b) This is achieved by estimating 3D geometry with an integrated canonical feature instead of aligning geometry predicted from different views.
  • Figure 2: SHARE Overview. SHARE aims to address geometric misalignment in pose-free 3D Gaussian splatting by jointly estimating relative poses and reconstructing 3D Gaussians in a canonical view. The framework consists of: (a) Ray-guided multi-view fusion module: Estimated Plücker rays serve as geometric priors in cost aggregation to construct pose-aware cost volumes, aligning multi-view features in a shared canonical space for improved geometric consistency. (b) Gaussian prediction module: Using fused multi-view features, anchor positions from $\textbf{V}_g$ guide the estimation of $k$ Gaussians per region via $\textbf{V}_f$, enabling fine-grained scene reconstruction with reduced misalignment.
  • Figure 3: Qualitative results on DTU and RealEstate10K datasets. We visualized rendering results of multiple scenes from DTU and RealEstate10K datasets. Our method captures fine details with correct geometry.
  • Figure 4: Effect of pose embedding on DTU.
  • Figure 5: Anchor-aligned Gaussian prediction on RealEstate10K.