Table of Contents
Fetching ...

SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization

Mae Younes, Amine Ouasfi, Adnane Boukhayma

TL;DR

The paper tackles reconstructing precise 3D geometry and view-dependent appearance from only a few images by learning a neural Signed Distance Function (SDF) $f_\theta$ and a radiance field $g_\phi$ within a differentiable volumetric rendering framework. Its key innovation is a Taylor-expansion inspired geometric regularization that enforces near-surface linearity of the SDF, combined with learning-free multi-view stereo cues and a progressive hash encoding to enable fast, priors-free training. This yields state-of-the-art results in both surface reconstruction and novel-view synthesis on standard benchmarks, with training times under 10 minutes. The approach significantly lowers data requirements for high-fidelity 3D capture, broadening practical applicability of neural implicit representations.

Abstract

We present a novel approach for recovering 3D shape and view dependent appearance from a few colored images, enabling efficient 3D reconstruction and novel view synthesis. Our method learns an implicit neural representation in the form of a Signed Distance Function (SDF) and a radiance field. The model is trained progressively through ray marching enabled volumetric rendering, and regularized with learning-free multi-view stereo (MVS) cues. Key to our contribution is a novel implicit neural shape function learning strategy that encourages our SDF field to be as linear as possible near the level-set, hence robustifying the training against noise emanating from the supervision and regularization signals. Without using any pretrained priors, our method, called SparseCraft, achieves state-of-the-art performances both in novel-view synthesis and reconstruction from sparse views in standard benchmarks, while requiring less than 10 minutes for training.

SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization

TL;DR

The paper tackles reconstructing precise 3D geometry and view-dependent appearance from only a few images by learning a neural Signed Distance Function (SDF) and a radiance field within a differentiable volumetric rendering framework. Its key innovation is a Taylor-expansion inspired geometric regularization that enforces near-surface linearity of the SDF, combined with learning-free multi-view stereo cues and a progressive hash encoding to enable fast, priors-free training. This yields state-of-the-art results in both surface reconstruction and novel-view synthesis on standard benchmarks, with training times under 10 minutes. The approach significantly lowers data requirements for high-fidelity 3D capture, broadening practical applicability of neural implicit representations.

Abstract

We present a novel approach for recovering 3D shape and view dependent appearance from a few colored images, enabling efficient 3D reconstruction and novel view synthesis. Our method learns an implicit neural representation in the form of a Signed Distance Function (SDF) and a radiance field. The model is trained progressively through ray marching enabled volumetric rendering, and regularized with learning-free multi-view stereo (MVS) cues. Key to our contribution is a novel implicit neural shape function learning strategy that encourages our SDF field to be as linear as possible near the level-set, hence robustifying the training against noise emanating from the supervision and regularization signals. Without using any pretrained priors, our method, called SparseCraft, achieves state-of-the-art performances both in novel-view synthesis and reconstruction from sparse views in standard benchmarks, while requiring less than 10 minutes for training.
Paper Structure (14 sections, 15 equations, 10 figures, 6 tables)

This paper contains 14 sections, 15 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Overview: In this toy example, we illustrate inference given 4 samples $\{\mathbf{r}(t)\}$ on a ray $\mathbf{r}$ (where the last hash resolution is not active yet). Dashed arrows symbolize losses operating mid-training. SparseCraft leverages differentiable volumetric rendering to learn a SDF based implicit representation given a few images, using MVS cues as regularization ( losses in Red).
  • Figure 2: Qualitative comparison of surface reconstruction in DTU from 3 views. SparseNeuS and VolRecon use deep data-driven priors, whereas we do not.
  • Figure 3: Qualitative comparison of surface reconstruction in BMVS from 3 views.
  • Figure 3: Numerical Ablation of our Taylor based geometric regularization losses.
  • Figure 4: Qualitative comparison of surface reconstruction on T$\&$T from 24 uniformly sampled views.
  • ...and 5 more figures