Table of Contents
Fetching ...

GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion

Li-Heng Chen, Zi-Xin Zou, Chang Liu, Tianjiao Jing, Yan-Pei Cao, Shi-Sheng Huang, Hongbo Fu, Hua Huang

TL;DR

The paper addresses pose-free surface reconstruction from unposed, often sparse, image sets by introducing Geometric Consistent Ray Diffusion (GCRayDiffusion). It over-parameterizes camera poses as neural bundle rays $\mathcal{R}$ with depth $d$ and employs a diffusion denoiser $g_{\phi}$ conditioned on the scene's triplane SDF $F_{\theta}$ to achieve multi-view-consistent pose estimation. On-surface sampling points $\mathcal{R}^d_t$ from the rays regularize the triplane-based SDF learning during diffusion, yielding geometrically coherent reconstructions. Experiments on Objaverse and GSO show superior pose accuracy and surface quality, particularly under sparse views, demonstrating robust pose-free 3D reconstruction with tight pose-geometry coupling.

Abstract

Accurate surface reconstruction from unposed images is crucial for efficient 3D object or scene creation. However, it remains challenging, particularly for the joint camera pose estimation. Previous approaches have achieved impressive pose-free surface reconstruction results in dense-view settings, but could easily fail for sparse-view scenarios without sufficient visual overlap. In this paper, we propose a new technique for pose-free surface reconstruction, which follows triplane-based signed distance field (SDF) learning but regularizes the learning by explicit points sampled from ray-based diffusion of camera pose estimation. Our key contribution is a novel Geometric Consistent Ray Diffusion model (GCRayDiffusion), where we represent camera poses as neural bundle rays and regress the distribution of noisy rays via a diffusion model. More importantly, we further condition the denoising process of RGRayDiffusion using the triplane-based SDF of the entire scene, which provides effective 3D consistent regularization to achieve multi-view consistent camera pose estimation. Finally, we incorporate RGRayDiffusion into the triplane-based SDF learning by introducing on-surface geometric regularization from the sampling points of the neural bundle rays, which leads to highly accurate pose-free surface reconstruction results even for sparse-view inputs. Extensive evaluations on public datasets show that our GCRayDiffusion achieves more accurate camera pose estimation than previous approaches, with geometrically more consistent surface reconstruction results, especially given sparse-view inputs.

GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion

TL;DR

The paper addresses pose-free surface reconstruction from unposed, often sparse, image sets by introducing Geometric Consistent Ray Diffusion (GCRayDiffusion). It over-parameterizes camera poses as neural bundle rays with depth and employs a diffusion denoiser conditioned on the scene's triplane SDF to achieve multi-view-consistent pose estimation. On-surface sampling points from the rays regularize the triplane-based SDF learning during diffusion, yielding geometrically coherent reconstructions. Experiments on Objaverse and GSO show superior pose accuracy and surface quality, particularly under sparse views, demonstrating robust pose-free 3D reconstruction with tight pose-geometry coupling.

Abstract

Accurate surface reconstruction from unposed images is crucial for efficient 3D object or scene creation. However, it remains challenging, particularly for the joint camera pose estimation. Previous approaches have achieved impressive pose-free surface reconstruction results in dense-view settings, but could easily fail for sparse-view scenarios without sufficient visual overlap. In this paper, we propose a new technique for pose-free surface reconstruction, which follows triplane-based signed distance field (SDF) learning but regularizes the learning by explicit points sampled from ray-based diffusion of camera pose estimation. Our key contribution is a novel Geometric Consistent Ray Diffusion model (GCRayDiffusion), where we represent camera poses as neural bundle rays and regress the distribution of noisy rays via a diffusion model. More importantly, we further condition the denoising process of RGRayDiffusion using the triplane-based SDF of the entire scene, which provides effective 3D consistent regularization to achieve multi-view consistent camera pose estimation. Finally, we incorporate RGRayDiffusion into the triplane-based SDF learning by introducing on-surface geometric regularization from the sampling points of the neural bundle rays, which leads to highly accurate pose-free surface reconstruction results even for sparse-view inputs. Extensive evaluations on public datasets show that our GCRayDiffusion achieves more accurate camera pose estimation than previous approaches, with geometrically more consistent surface reconstruction results, especially given sparse-view inputs.

Paper Structure

This paper contains 15 sections, 8 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: We achieve accurate pose-free neural surface learning with the aid of a novel geometric consistent ray diffusion, i.e., GCRayDiffusion, even from sparse view images (left column). Our GCRayDiffusion model formulates the images' camera poses as neural ray bundles and provides explicit sampling points generated during the denoiser processing (middle columns) to regularize the triplane-based SDF learning, achieving accurate surface reconstruction and camera pose estimation simultaneously (right column).
  • Figure 2: The pipeline of our GCRayDiffusion. Given sparse view images $\mathcal{I}$, our approach extract the image features $F_{\mathcal{I}}$ using an image encoder, and feed $F_{\mathcal{I}}$ to two sub-branches: (1) Geometric Consistent Denoiser processing, which regresses the neural ray bundles $\mathcal{R}^{d}_t$ following a SDF conditioned ray-based diffusion, to estimate the camera poses, and (2) Neural Surface Learning of a triplane-based SDF $F_{\theta}(\mathcal{R}^{d}_t)$. During the ray bundles denoiser processing, we generate explicit sampling points from neural ray bundles to regularizing the neural surface learning, by querying their SDFs from the triplane-based SDF and locating their position on the surface of the object shape, which leads to accurate surface reconstruction and camera poses estimation simuntaneously.
  • Figure 3: The illustration of our neural bundle rays definition.
  • Figure 4: Qualitative surface reconstruction comparison evaluated on Objaverse dataset for different comparing approaches, including RelPose++, FORGE, DUSt3R and our GCRayDiffusion (from left to right column) respectively.
  • Figure 5: Qualitative surface reconstruction comparison evaluated on GSO dataset for different comparing approaches, including RelPose++, FORGE, DUSt3R, and our GCRayDiffusion (from left to right column) respectively.
  • ...and 1 more figures