Table of Contents
Fetching ...

Sur2f: A Hybrid Representation for High-Quality and Efficient Surface Reconstruction from Multi-view Images

Zhangjin Huang, Zhihao Liang, Haojie Zhang, Yangkai Lin, Kui Jia

TL;DR

Sur2f addresses the ill-posed task of multi-view surface reconstruction by integrating an implicit SDF and an explicit surrogate mesh into a single hybrid representation. It synchronizes the two via deformation driven by the SDF and unifies their rendering with a shared neural shader, enabling dual supervision from image data and improved sampling efficiency through surface-guided ray sampling. The method achieves state-of-the-art geometry accuracy and fast convergence on benchmarks like DTU, and demonstrates strong performance in indoor/outdoor scenes and inverse rendering setups, while supporting real-time rendering. This hybrid approach effectively leverages the strengths of both explicit and implicit surfaces, enabling robust 3D reconstruction and downstream applications such as text-to-3D generation and relighting.

Abstract

Multi-view surface reconstruction is an ill-posed, inverse problem in 3D vision research. It involves modeling the geometry and appearance with appropriate surface representations. Most of the existing methods rely either on explicit meshes, using surface rendering of meshes for reconstruction, or on implicit field functions, using volume rendering of the fields for reconstruction. The two types of representations in fact have their respective merits. In this work, we propose a new hybrid representation, termed Sur2f, aiming to better benefit from both representations in a complementary manner. Technically, we learn two parallel streams of an implicit signed distance field and an explicit surrogate surface Sur2f mesh, and unify volume rendering of the implicit signed distance function (SDF) and surface rendering of the surrogate mesh with a shared, neural shader; the unified shading promotes their convergence to the same, underlying surface. We synchronize learning of the surrogate mesh by driving its deformation with functions induced from the implicit SDF. In addition, the synchronized surrogate mesh enables surface-guided volume sampling, which greatly improves the sampling efficiency per ray in volume rendering. We conduct thorough experiments showing that Sur$^2$f outperforms existing reconstruction methods and surface representations, including hybrid ones, in terms of both recovery quality and recovery efficiency.

Sur2f: A Hybrid Representation for High-Quality and Efficient Surface Reconstruction from Multi-view Images

TL;DR

Sur2f addresses the ill-posed task of multi-view surface reconstruction by integrating an implicit SDF and an explicit surrogate mesh into a single hybrid representation. It synchronizes the two via deformation driven by the SDF and unifies their rendering with a shared neural shader, enabling dual supervision from image data and improved sampling efficiency through surface-guided ray sampling. The method achieves state-of-the-art geometry accuracy and fast convergence on benchmarks like DTU, and demonstrates strong performance in indoor/outdoor scenes and inverse rendering setups, while supporting real-time rendering. This hybrid approach effectively leverages the strengths of both explicit and implicit surfaces, enabling robust 3D reconstruction and downstream applications such as text-to-3D generation and relighting.

Abstract

Multi-view surface reconstruction is an ill-posed, inverse problem in 3D vision research. It involves modeling the geometry and appearance with appropriate surface representations. Most of the existing methods rely either on explicit meshes, using surface rendering of meshes for reconstruction, or on implicit field functions, using volume rendering of the fields for reconstruction. The two types of representations in fact have their respective merits. In this work, we propose a new hybrid representation, termed Sur2f, aiming to better benefit from both representations in a complementary manner. Technically, we learn two parallel streams of an implicit signed distance field and an explicit surrogate surface Sur2f mesh, and unify volume rendering of the implicit signed distance function (SDF) and surface rendering of the surrogate mesh with a shared, neural shader; the unified shading promotes their convergence to the same, underlying surface. We synchronize learning of the surrogate mesh by driving its deformation with functions induced from the implicit SDF. In addition, the synchronized surrogate mesh enables surface-guided volume sampling, which greatly improves the sampling efficiency per ray in volume rendering. We conduct thorough experiments showing that Surf outperforms existing reconstruction methods and surface representations, including hybrid ones, in terms of both recovery quality and recovery efficiency.
Paper Structure (19 sections, 18 equations, 16 figures, 4 tables)

This paper contains 19 sections, 18 equations, 16 figures, 4 tables.

Figures (16)

  • Figure 1: The proposed Sur$^2$f and its applications.
  • Figure 2: Overview of Sur$^2$f. Sur$^2$f learns parallel streams of an implicit SDF $f$ and a surrogate surface mesh $\widehat{\mathcal{S}} = \{ \widehat{\mathcal{V}}, \widehat{\mathcal{F}} \}$. Deformation of vertices $\{ \widehat{\mathcal{V}} \}$ of the surrogate $\widehat{\mathcal{S}}$ is driven by functions induced from $f$, which enforces synchronization of the two representations (cf. Section \ref{['subsec:surrogate']}). We use a shared, neural shader for both SDF-induced volume rendering and surface rendering of $\widehat{\mathcal{S}}$ (cf. Section \ref{['subsec:commonshading']}), which enables dual supervisions from image observations. We also use surface-guided volume sampling (i.e., guided by $\widehat{\mathcal{S}}$) to improve the sampling efficiency (and consequently reconstruction quality) of volume rendering (cf. Section \ref{['subsec:sampling']}). During inference, we use marching cubes lorensen1987marching to extract a mesh $\mathcal{S}$ from $f$; with the already learned neural shader, we achieve photorealisc, real-time rendering from Sur$^2$f.
  • Figure 3: Visualization of the synchronized deformation of the surrogate mesh. Each error map shows the distance between the surrogate mesh and the mesh extracted from the SDF $f$ at a certain training iteration.
  • Figure 4: Qualitative comparisons on the DTU dataset jensen2014large. The surface details are zoomed in and visualized as colors coded by surface normals for a better view. For each scene, the upper row is for geometry reconstruction while the lower row is for image synthesis.
  • Figure 5: Visualization of images synthesized by Volume Rendering (VR) and Surface Rendering (SR) with/without learning a shared shading network. The error maps measure the absolute pixel distance between rendered images and reference.
  • ...and 11 more figures