Table of Contents
Fetching ...

Adaptive 3D Reconstruction via Diffusion Priors and Forward Curvature-Matching Likelihood Updates

Seunghyeok Shin, Dabin Kim, Hongki Lim

TL;DR

This work tackles image-to-3D reconstruction by decoupling a learned diffusion prior over colored point clouds from the measurement model, enabling adaptive, adjoint-free likelihood updates via Forward Curvature-Matching (FCM) during diffusion sampling. By replacing fixed-step updates with a principled curvature-based step-size strategy, the method achieves accurate single- and multi-view reconstructions while reducing neural function evaluations. The approach provides theoretical guarantees on loss decrease and contraction, and demonstrates strong empirical performance on ShapeNet and CO3D, with improved F-score and reduced CD/EMD compared to prior diffusion-based methods. Its modality-agnostic measurement operator and retraining-free adaptability make it practical for diverse 3D reconstruction tasks, including depth-map and multi-view scenarios.

Abstract

Reconstructing high-quality point clouds from images remains challenging in computer vision. Existing generative-model-based approaches, particularly diffusion-model approaches that directly learn the posterior, may suffer from inflexibility -- they require conditioning signals during training, support only a fixed number of input views, and need complete retraining for different measurements. Recent diffusion-based methods have attempted to address this by combining prior models with likelihood updates, but they rely on heuristic fixed step sizes for the likelihood update that lead to slow convergence and suboptimal reconstruction quality. We advance this line of approach by integrating our novel Forward Curvature-Matching (FCM) update method with diffusion sampling. Our method dynamically determines optimal step sizes using only forward automatic differentiation and finite-difference curvature estimates, enabling precise optimization of the likelihood update. This formulation enables high-fidelity reconstruction from both single-view and multi-view inputs, and supports various input modalities through simple operator substitution -- all without retraining. Experiments on ShapeNet and CO3D datasets demonstrate that our method achieves superior reconstruction quality at matched or lower NFEs, yielding higher F-score and lower CD and EMD, validating its efficiency and adaptability for practical applications. Code is available at https://github.com/Seunghyeok0715/FCM

Adaptive 3D Reconstruction via Diffusion Priors and Forward Curvature-Matching Likelihood Updates

TL;DR

This work tackles image-to-3D reconstruction by decoupling a learned diffusion prior over colored point clouds from the measurement model, enabling adaptive, adjoint-free likelihood updates via Forward Curvature-Matching (FCM) during diffusion sampling. By replacing fixed-step updates with a principled curvature-based step-size strategy, the method achieves accurate single- and multi-view reconstructions while reducing neural function evaluations. The approach provides theoretical guarantees on loss decrease and contraction, and demonstrates strong empirical performance on ShapeNet and CO3D, with improved F-score and reduced CD/EMD compared to prior diffusion-based methods. Its modality-agnostic measurement operator and retraining-free adaptability make it practical for diverse 3D reconstruction tasks, including depth-map and multi-view scenarios.

Abstract

Reconstructing high-quality point clouds from images remains challenging in computer vision. Existing generative-model-based approaches, particularly diffusion-model approaches that directly learn the posterior, may suffer from inflexibility -- they require conditioning signals during training, support only a fixed number of input views, and need complete retraining for different measurements. Recent diffusion-based methods have attempted to address this by combining prior models with likelihood updates, but they rely on heuristic fixed step sizes for the likelihood update that lead to slow convergence and suboptimal reconstruction quality. We advance this line of approach by integrating our novel Forward Curvature-Matching (FCM) update method with diffusion sampling. Our method dynamically determines optimal step sizes using only forward automatic differentiation and finite-difference curvature estimates, enabling precise optimization of the likelihood update. This formulation enables high-fidelity reconstruction from both single-view and multi-view inputs, and supports various input modalities through simple operator substitution -- all without retraining. Experiments on ShapeNet and CO3D datasets demonstrate that our method achieves superior reconstruction quality at matched or lower NFEs, yielding higher F-score and lower CD and EMD, validating its efficiency and adaptability for practical applications. Code is available at https://github.com/Seunghyeok0715/FCM

Paper Structure

This paper contains 35 sections, 7 theorems, 33 equations, 22 figures, 8 tables, 1 algorithm.

Key Result

Theorem 3.4

Let $c = \min\{\frac{\eta_{\text{FCM}}}{2L}, \frac{1}{8L}\}$. Our FCM algorithm ensures:

Figures (22)

  • Figure 1: Left: Visualization of our diffusion process from random noise ($T=256$) to final reconstructions ($T=0$) for various object categories. Right: Comparison of point cloud reconstructions between Ground Truth, previous methods (PC$^2$MelasKyriazi2023PC2PP, BDM Xu2024BayesianDM), and our approach. Our method achieves higher fidelity reconstructions with better F-scores (0.382) than existing approaches while using fewer function evaluations, particularly excelling at preserving fine structural details.
  • Figure 2: Overview of our FCM-guided point cloud diffusion framework. The sampling phase (left) shows how the diffusion model progressively transforms random noise $\mathbf{X}_t$ into structured point clouds through DDIM sampling. The FCM likelihood update (right) illustrates our key innovation---dynamically determining optimal step sizes for the likelihood gradient $\nabla\|\mathbf{y}-R(\hat{\mathbf{X}}_{0|t})\|_2$. This principled optimization approach enables high-fidelity reconstruction that accurately matches input images while requiring fewer function evaluations than existing methods.
  • Figure 3: Qualitative comparison of single-view 3D reconstructions on the ShapeNet dataset. The figure displays point cloud reconstructions from our method, PC$^2$, and BDM for various object categories, highlighting the superior detail and accuracy of our approach.
  • Figure 4: Comparison of rendered images from reconstructed point clouds on the CO3D dataset. The figure shows renderings from our method and PC$^2$, illustrating the higher fidelity and better preservation of details in our reconstructions.
  • Figure 5: Reconstruction from depth maps. The figure showcases point cloud reconstructions generated from depth map inputs.
  • ...and 17 more figures

Theorems & Definitions (14)

  • Theorem 3.4: Guaranteed Loss Decrease
  • Proposition 3.5: Contraction Preservation
  • Lemma A.3.1: Step Size Bounds
  • proof
  • Remark A.3.2
  • Lemma A.3.3: Firmly Non-Expansive Gradient Step
  • proof
  • Theorem A.3.4: Guaranteed Loss Decrease
  • proof
  • Corollary A.3.5: Gradient Norm Convergence
  • ...and 4 more