Table of Contents
Fetching ...

EvaSurf: Efficient View-Aware Implicit Textured Surface Reconstruction

Jingnan Gao, Zhuo Chen, Yichao Yan, Bowen Pan, Zhe Wang, Jiangjing Lyu, Xiaokang Yang

TL;DR

EvaSurf addresses the challenge of real-time, high-fidelity 3D reconstruction on mobile devices by coupling an efficient explicit geometry learner with a view-aware implicit texture and a lightweight neural shader. The method uses progressive grids and multi-view supervision to obtain accurate meshes, while a topologically structured, view-conditioned implicit texture captures view-dependent appearance with a small shader for rendering. Training is fast (1–2 hours on a single GPU) and results in a compact rendering package suitable for mobile deployment, achieving real-time performance (>40 FPS) with high-quality geometry and appearance. This work enables practical, device-friendly 3D reconstruction for applications in VR/AR, gaming, and on-device rendering, balancing geometry fidelity, rendering realism, and hardware efficiency.

Abstract

Reconstructing real-world 3D objects has numerous applications in computer vision, such as virtual reality, video games, and animations. Ideally, 3D reconstruction methods should generate high-fidelity results with 3D consistency in real-time. Traditional methods match pixels between images using photo-consistency constraints or learned features, while differentiable rendering methods like Neural Radiance Fields (NeRF) use differentiable volume rendering or surface-based representation to generate high-fidelity scenes. However, these methods require excessive runtime for rendering, making them impractical for daily applications. To address these challenges, we present $\textbf{EvaSurf}$, an $\textbf{E}$fficient $\textbf{V}$iew-$\textbf{A}$ware implicit textured $\textbf{Surf}$ace reconstruction method. In our method, we first employ an efficient surface-based model with a multi-view supervision module to ensure accurate mesh reconstruction. To enable high-fidelity rendering, we learn an implicit texture embedded with view-aware encoding to capture view-dependent information. Furthermore, with the explicit geometry and the implicit texture, we can employ a lightweight neural shader to reduce the expense of computation and further support real-time rendering on common mobile devices. Extensive experiments demonstrate that our method can reconstruct high-quality appearance and accurate mesh on both synthetic and real-world datasets. Moreover, our method can be trained in just 1-2 hours using a single GPU and run on mobile devices at over 40 FPS (Frames Per Second), with a final package required for rendering taking up only 40-50 MB.

EvaSurf: Efficient View-Aware Implicit Textured Surface Reconstruction

TL;DR

EvaSurf addresses the challenge of real-time, high-fidelity 3D reconstruction on mobile devices by coupling an efficient explicit geometry learner with a view-aware implicit texture and a lightweight neural shader. The method uses progressive grids and multi-view supervision to obtain accurate meshes, while a topologically structured, view-conditioned implicit texture captures view-dependent appearance with a small shader for rendering. Training is fast (1–2 hours on a single GPU) and results in a compact rendering package suitable for mobile deployment, achieving real-time performance (>40 FPS) with high-quality geometry and appearance. This work enables practical, device-friendly 3D reconstruction for applications in VR/AR, gaming, and on-device rendering, balancing geometry fidelity, rendering realism, and hardware efficiency.

Abstract

Reconstructing real-world 3D objects has numerous applications in computer vision, such as virtual reality, video games, and animations. Ideally, 3D reconstruction methods should generate high-fidelity results with 3D consistency in real-time. Traditional methods match pixels between images using photo-consistency constraints or learned features, while differentiable rendering methods like Neural Radiance Fields (NeRF) use differentiable volume rendering or surface-based representation to generate high-fidelity scenes. However, these methods require excessive runtime for rendering, making them impractical for daily applications. To address these challenges, we present , an fficient iew-ware implicit textured ace reconstruction method. In our method, we first employ an efficient surface-based model with a multi-view supervision module to ensure accurate mesh reconstruction. To enable high-fidelity rendering, we learn an implicit texture embedded with view-aware encoding to capture view-dependent information. Furthermore, with the explicit geometry and the implicit texture, we can employ a lightweight neural shader to reduce the expense of computation and further support real-time rendering on common mobile devices. Extensive experiments demonstrate that our method can reconstruct high-quality appearance and accurate mesh on both synthetic and real-world datasets. Moreover, our method can be trained in just 1-2 hours using a single GPU and run on mobile devices at over 40 FPS (Frames Per Second), with a final package required for rendering taking up only 40-50 MB.
Paper Structure (18 sections, 9 equations, 12 figures, 10 tables)

This paper contains 18 sections, 9 equations, 12 figures, 10 tables.

Figures (12)

  • Figure 1: Examples of reconstruction results of EvaSurf. Our model can reconstruct high-quality appearance and accurate mesh for both synthetic and real-world objects. EvaSurf also supports real-time rendering on various devices.
  • Figure 2: The overview of our full pipeline. We begin with an initial explicit geometry generated by a surface-based model. We then rasterize the mesh and utilize view-aware encoding to embed the view-dependent information into a learnable implicit texture. Given the geometry and texture, a neural shader renders the final RGB images.
  • Figure 3: View-dependent embedded mechanism in feature space. We equip our model with a set of Gaussians to capture the view-dependent information. The heatmap demonstrates the corresponding relation between the values and the view direction.
  • Figure 4: Comparison of meshes on NeRF synthetic dataset. Our method generates more accurate meshes than previous methods, especially for objects where view-dependent effects occur. The Chamfer Distance $\downarrow$ (the unit is $10^{-3}$) results are also provided.
  • Figure 5: Results on real-world dataset. Our method can reconstruct 3D objects from the real-world dataset with high-fidelity rendered results and more accurate mesh than the existing methods. We denote the NeRF2Mesh nerf2mesh better mesh setting as "N2M Mesh-M" and better rendering setting as "N2M Mesh-R" in the figure.
  • ...and 7 more figures