Table of Contents
Fetching ...

Zero-P-to-3: Zero-Shot Partial-View Images to 3D Object

Yuxuan Lin, Ruihang Chu, Zhenyu Chen, Xiao Tang, Lei Ke, Haoling Li, Yingji Zhong, Zhihao Li, Shiyong Liu, Xiaofei Wu, Jianzhuang Liu, Yujiu Yang

TL;DR

Zero-P-to-3 tackles 3D reconstruction from partial observations by starting from a coarse 3D Gaussian Splatting (3DGS) model and refining it through a training-free framework that blends multi-view diffusion priors with a geometric prior and an image-restoration prior. It introduces an iterative rotated-view refinement strategy and a composite loss $L_{\text{total}} = L_{\text{rec}} + \lambda L_{\text{LPIPS}}$ to supervise Gaussian parameters, while a diffusion-based multi-prior fusion yields $\varepsilon_t = \varepsilon_{\text{MVD}} + w_{\text{HF}}(t) \varepsilon_{\text{HF}}^t + w_{\text{LF}}(t) \varepsilon_{\text{LF}}^t$ with time-dependent weights. The sampling follows DDIM with $ x_{t-1} = x_t + (\sqrt{\alpha_{t-1}} - \sqrt{\alpha_t}) \frac{\varepsilon_t}{\sqrt{1-\alpha_t}} $, enabling coherent novel-view synthesis even in unseen regions. Experiments on synthetic Objaverse and real-world RefNeRF demonstrate superior reconstruction fidelity and multi-view consistency, particularly for invisible regions, compared to prior methods.

Abstract

Generative 3D reconstruction shows strong potential in incomplete observations. While sparse-view and single-image reconstruction are well-researched, partial observation remains underexplored. In this context, dense views are accessible only from a specific angular range, with other perspectives remaining inaccessible. This task presents two main challenges: (i) limited View Range: observations confined to a narrow angular scope prevent effective traditional interpolation techniques that require evenly distributed perspectives. (ii) inconsistent Generation: views created for invisible regions often lack coherence with both visible regions and each other, compromising reconstruction consistency. To address these challenges, we propose \method, a novel training-free approach that integrates the local dense observations and multi-source priors for reconstruction. Our method introduces a fusion-based strategy to effectively align these priors in DDIM sampling, thereby generating multi-view consistent images to supervise invisible views. We further design an iterative refinement strategy, which uses the geometric structures of the object to enhance reconstruction quality. Extensive experiments on multiple datasets show the superiority of our method over SOTAs, especially in invisible regions.

Zero-P-to-3: Zero-Shot Partial-View Images to 3D Object

TL;DR

Zero-P-to-3 tackles 3D reconstruction from partial observations by starting from a coarse 3D Gaussian Splatting (3DGS) model and refining it through a training-free framework that blends multi-view diffusion priors with a geometric prior and an image-restoration prior. It introduces an iterative rotated-view refinement strategy and a composite loss to supervise Gaussian parameters, while a diffusion-based multi-prior fusion yields with time-dependent weights. The sampling follows DDIM with , enabling coherent novel-view synthesis even in unseen regions. Experiments on synthetic Objaverse and real-world RefNeRF demonstrate superior reconstruction fidelity and multi-view consistency, particularly for invisible regions, compared to prior methods.

Abstract

Generative 3D reconstruction shows strong potential in incomplete observations. While sparse-view and single-image reconstruction are well-researched, partial observation remains underexplored. In this context, dense views are accessible only from a specific angular range, with other perspectives remaining inaccessible. This task presents two main challenges: (i) limited View Range: observations confined to a narrow angular scope prevent effective traditional interpolation techniques that require evenly distributed perspectives. (ii) inconsistent Generation: views created for invisible regions often lack coherence with both visible regions and each other, compromising reconstruction consistency. To address these challenges, we propose \method, a novel training-free approach that integrates the local dense observations and multi-source priors for reconstruction. Our method introduces a fusion-based strategy to effectively align these priors in DDIM sampling, thereby generating multi-view consistent images to supervise invisible views. We further design an iterative refinement strategy, which uses the geometric structures of the object to enhance reconstruction quality. Extensive experiments on multiple datasets show the superiority of our method over SOTAs, especially in invisible regions.

Paper Structure

This paper contains 15 sections, 8 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Zero-P-to-3 reconstructs 3D objects where visible regions are confined to a limited field-of-view. We first reconstruct a coarse 3D Gaussian Splatting (3DGS) model from partial observations, then iteratively refine it using multiple priors to obtain the final 3DGS model.
  • Figure 2: Visualizations of input images (top), masks (middle), and processed images (bottom) from the RefNeRF verbin2022ref dataset.
  • Figure 3: Overview of Zero-P-to-3 for 3D reconstruction from partial observations. Starting with input partial images, the system first constructs a coarse 3D Gaussian Splatting (3DGS) model, then renders both visible and invisible regions. These renderings are enhanced through Multi-Prior Score Fusion to generate high-quality output images that serve as supervision for refining the 3DGS model. Through iterative refinement using L1 and perceptual losses, the system produces a comprehensive 3D representation that effectively bridges the gap between limited observations and complete object reconstruction.
  • Figure 4: The Multi-Prior Score Fusion framework combines three components to generate novel view images: multi-view diffusion ($\varepsilon_{MVD}$) for fusing reference view information, geometric prior ($\varepsilon_{LF}^t$) for structural consistency, and image restoration prior ($\varepsilon_{HF}^t$) for enhancing details. These components process both visible and invisible regions from reference views and are combined with time-dependent weights to produce the noise prediction $\varepsilon_t$ used in DDIM sampling for the final image generation.
  • Figure 5: Schematic diagram of rotated views. Yellow lines indicate visible views used as conditional inputs for fusion-based inference, while blue lines represent invisible views that provide geometric and coarse texture information requiring restoration. Starting from initial angle $\Theta_0$, viewpoints are rotated in batches of $N$ to angles $\Theta_1$, $\Theta_2$, etc., progressively densifying the viewpoints until complete dense supervision of the object is achieved.
  • ...and 6 more figures