Table of Contents
Fetching ...

SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

Zixuan Huang, Mark Boss, Aaryaman Vasishta, James M. Rehg, Varun Jampani

TL;DR

SPAR3D addresses the challenge of single-view 3D reconstruction by integrating a diffusion-based point-sampling stage with an image-conditioned meshing stage. This two-stage design leverages probabilistic modeling for occluded geometry while achieving high-fidelity, fast reconstructions and enabling interactive edits via a lightweight sparse point cloud intermediate representation. The method achieves state-of-the-art accuracy with inference times around 0.7 seconds and demonstrates strong generalization, including in-the-wild and AI-generated inputs. This approach offers a practical, scalable path toward high-quality, editable 3D assets for AR, VFX, and design pipelines.

Abstract

We study the problem of single-image 3D object reconstruction. Recent works have diverged into two directions: regression-based modeling and generative modeling. Regression methods efficiently infer visible surfaces, but struggle with occluded regions. Generative methods handle uncertain regions better by modeling distributions, but are computationally expensive and the generation is often misaligned with visible surfaces. In this paper, we present SPAR3D, a novel two-stage approach aiming to take the best of both directions. The first stage of SPAR3D generates sparse 3D point clouds using a lightweight point diffusion model, which has a fast sampling speed. The second stage uses both the sampled point cloud and the input image to create highly detailed meshes. Our two-stage design enables probabilistic modeling of the ill-posed single-image 3D task while maintaining high computational efficiency and great output fidelity. Using point clouds as an intermediate representation further allows for interactive user edits. Evaluated on diverse datasets, SPAR3D demonstrates superior performance over previous state-of-the-art methods, at an inference speed of 0.7 seconds. Project page with code and model: https://spar3d.github.io

SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

TL;DR

SPAR3D addresses the challenge of single-view 3D reconstruction by integrating a diffusion-based point-sampling stage with an image-conditioned meshing stage. This two-stage design leverages probabilistic modeling for occluded geometry while achieving high-fidelity, fast reconstructions and enabling interactive edits via a lightweight sparse point cloud intermediate representation. The method achieves state-of-the-art accuracy with inference times around 0.7 seconds and demonstrates strong generalization, including in-the-wild and AI-generated inputs. This approach offers a practical, scalable path toward high-quality, editable 3D assets for AR, VFX, and design pipelines.

Abstract

We study the problem of single-image 3D object reconstruction. Recent works have diverged into two directions: regression-based modeling and generative modeling. Regression methods efficiently infer visible surfaces, but struggle with occluded regions. Generative methods handle uncertain regions better by modeling distributions, but are computationally expensive and the generation is often misaligned with visible surfaces. In this paper, we present SPAR3D, a novel two-stage approach aiming to take the best of both directions. The first stage of SPAR3D generates sparse 3D point clouds using a lightweight point diffusion model, which has a fast sampling speed. The second stage uses both the sampled point cloud and the input image to create highly detailed meshes. Our two-stage design enables probabilistic modeling of the ill-posed single-image 3D task while maintaining high computational efficiency and great output fidelity. Using point clouds as an intermediate representation further allows for interactive user edits. Evaluated on diverse datasets, SPAR3D demonstrates superior performance over previous state-of-the-art methods, at an inference speed of 0.7 seconds. Project page with code and model: https://spar3d.github.io
Paper Structure (40 sections, 3 equations, 12 figures, 3 tables)

This paper contains 40 sections, 3 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: We present SPAR3D, a state-of-the-art 3D reconstructor that reconstructs high-quality 3D meshes from single-view images. SPAR3D enjoys a fast reconstruction speed at 0.7 seconds and supports interactive user edits.
  • Figure 2: SPAR3D Overview. Conditioned on the input image, SPAR3D first leverages a point diffusion model to generate a sparse point cloud. The triplane transformer then uses the sampled point cloud and image features to produce high-resolution triplane features. The triplane features are then queried to reconstruct the geometry, texture, and illumination of the object in the image.
  • Figure 3: Our Differentiable Renderer. We estimate geometry, albedo, lighting, and normal maps from the triplane and metallic/roughness values from the image. We rasterize and interpolate these values as input to our shader (omitted here for simplicity). Our shader uses the Disney BRDF Burley2012 and performs Monte Carlo integration. We further perform visibility testing to improve shadow modeling. Finally, we compare the rendered image with the GT image and minimize the rendering loss.
  • Figure 4: Shadow Modeling. We perform visibility testing in screen-space by marching along sampled rays. If any point along the ray has a ray depth which is farther away than the depth map, we consider the entire ray as shadowed.
  • Figure 5: Qualitative Comparison. We compare SPAR3D to other state-of-the-art methods visually. SPAR3D not only aligns better with the visible surfaces from images, but also generates higher-quality geometries and textures for the occluded surfaces.
  • ...and 7 more figures