GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion
Li-Heng Chen, Zi-Xin Zou, Chang Liu, Tianjiao Jing, Yan-Pei Cao, Shi-Sheng Huang, Hongbo Fu, Hua Huang
TL;DR
The paper addresses pose-free surface reconstruction from unposed, often sparse, image sets by introducing Geometric Consistent Ray Diffusion (GCRayDiffusion). It over-parameterizes camera poses as neural bundle rays $\mathcal{R}$ with depth $d$ and employs a diffusion denoiser $g_{\phi}$ conditioned on the scene's triplane SDF $F_{\theta}$ to achieve multi-view-consistent pose estimation. On-surface sampling points $\mathcal{R}^d_t$ from the rays regularize the triplane-based SDF learning during diffusion, yielding geometrically coherent reconstructions. Experiments on Objaverse and GSO show superior pose accuracy and surface quality, particularly under sparse views, demonstrating robust pose-free 3D reconstruction with tight pose-geometry coupling.
Abstract
Accurate surface reconstruction from unposed images is crucial for efficient 3D object or scene creation. However, it remains challenging, particularly for the joint camera pose estimation. Previous approaches have achieved impressive pose-free surface reconstruction results in dense-view settings, but could easily fail for sparse-view scenarios without sufficient visual overlap. In this paper, we propose a new technique for pose-free surface reconstruction, which follows triplane-based signed distance field (SDF) learning but regularizes the learning by explicit points sampled from ray-based diffusion of camera pose estimation. Our key contribution is a novel Geometric Consistent Ray Diffusion model (GCRayDiffusion), where we represent camera poses as neural bundle rays and regress the distribution of noisy rays via a diffusion model. More importantly, we further condition the denoising process of RGRayDiffusion using the triplane-based SDF of the entire scene, which provides effective 3D consistent regularization to achieve multi-view consistent camera pose estimation. Finally, we incorporate RGRayDiffusion into the triplane-based SDF learning by introducing on-surface geometric regularization from the sampling points of the neural bundle rays, which leads to highly accurate pose-free surface reconstruction results even for sparse-view inputs. Extensive evaluations on public datasets show that our GCRayDiffusion achieves more accurate camera pose estimation than previous approaches, with geometrically more consistent surface reconstruction results, especially given sparse-view inputs.
