Table of Contents
Fetching ...

Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation

Yujin Chen, Yinyu Nie, Benjamin Ummenhofer, Reiner Birkl, Michael Paulitsch, Matthias Müller, Matthias Nießner

TL;DR

Mesh2NeRF directly derives ground-truth radiance fields from textured meshes by modeling a surface-thickness occupancy for density and a BRDF-based, view-dependent color under environment lighting, yielding a theoretically exact radiance field without multi-view rendering. This radiance field serves as direct supervision for NeRFs and diffusion-based 3D generation, replacing traditional rendering losses and reducing artifacts from 2D supervision. Empirically, Mesh2NeRF improves view synthesis in single-scene fitting (e.g., up to +3.12 dB PSNR on ABO), enhances conditional NeRF generation on ShapeNet Cars/Chairs, and produces more geometrically plausible results in unconditional NeRF generation on Objaverse Mugs. The approach is compatible with multiple NeRF representations (e.g., NeRF, NGP, TensoRF, SSDNeRF) and offers a robust mesh-based prior for 3D content generation from mesh collections.

Abstract

We present Mesh2NeRF, an approach to derive ground-truth radiance fields from textured meshes for 3D generation tasks. Many 3D generative approaches represent 3D scenes as radiance fields for training. Their ground-truth radiance fields are usually fitted from multi-view renderings from a large-scale synthetic 3D dataset, which often results in artifacts due to occlusions or under-fitting issues. In Mesh2NeRF, we propose an analytic solution to directly obtain ground-truth radiance fields from 3D meshes, characterizing the density field with an occupancy function featuring a defined surface thickness, and determining view-dependent color through a reflection function considering both the mesh and environment lighting. Mesh2NeRF extracts accurate radiance fields which provides direct supervision for training generative NeRFs and single scene representation. We validate the effectiveness of Mesh2NeRF across various tasks, achieving a noteworthy 3.12dB improvement in PSNR for view synthesis in single scene representation on the ABO dataset, a 0.69 PSNR enhancement in the single-view conditional generation of ShapeNet Cars, and notably improved mesh extraction from NeRF in the unconditional generation of Objaverse Mugs.

Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation

TL;DR

Mesh2NeRF directly derives ground-truth radiance fields from textured meshes by modeling a surface-thickness occupancy for density and a BRDF-based, view-dependent color under environment lighting, yielding a theoretically exact radiance field without multi-view rendering. This radiance field serves as direct supervision for NeRFs and diffusion-based 3D generation, replacing traditional rendering losses and reducing artifacts from 2D supervision. Empirically, Mesh2NeRF improves view synthesis in single-scene fitting (e.g., up to +3.12 dB PSNR on ABO), enhances conditional NeRF generation on ShapeNet Cars/Chairs, and produces more geometrically plausible results in unconditional NeRF generation on Objaverse Mugs. The approach is compatible with multiple NeRF representations (e.g., NeRF, NGP, TensoRF, SSDNeRF) and offers a robust mesh-based prior for 3D content generation from mesh collections.

Abstract

We present Mesh2NeRF, an approach to derive ground-truth radiance fields from textured meshes for 3D generation tasks. Many 3D generative approaches represent 3D scenes as radiance fields for training. Their ground-truth radiance fields are usually fitted from multi-view renderings from a large-scale synthetic 3D dataset, which often results in artifacts due to occlusions or under-fitting issues. In Mesh2NeRF, we propose an analytic solution to directly obtain ground-truth radiance fields from 3D meshes, characterizing the density field with an occupancy function featuring a defined surface thickness, and determining view-dependent color through a reflection function considering both the mesh and environment lighting. Mesh2NeRF extracts accurate radiance fields which provides direct supervision for training generative NeRFs and single scene representation. We validate the effectiveness of Mesh2NeRF across various tasks, achieving a noteworthy 3.12dB improvement in PSNR for view synthesis in single scene representation on the ABO dataset, a 0.69 PSNR enhancement in the single-view conditional generation of ShapeNet Cars, and notably improved mesh extraction from NeRF in the unconditional generation of Objaverse Mugs.
Paper Structure (19 sections, 12 equations, 17 figures, 5 tables)

This paper contains 19 sections, 12 equations, 17 figures, 5 tables.

Figures (17)

  • Figure 1: We propose Mesh2NeRF, a novel method for extracting ground truth radiance fields directly from 3D textured meshes by incorporating mesh geometry, texture, and environment lighting information. Mesh2NeRF serves as direct 3D supervision for neural radiance fields, offering a comprehensive approach to leveraging mesh data for improving novel view synthesis performance. Mesh2NeRF can function as supervision for generative models during training on mesh collections, advancing various 3D generation tasks, including unconditional and conditional generation.
  • Figure 2: Our method, illustrated above, constructs a ground truth radiance field from a textured mesh. Using a surface-based occupancy function with a distance threshold, we model the scene's density field. View-dependent color is then modeled, considering view direction, surface geometry, and light direction. Integrating samples along the camera ray enables accurate volume rendering from our defined radiance field. The bottom part showcases Mesh2NeRF as direct 3D supervision for NeRF tasks, where the density and color values of each ray sample supervise NeRF ray samples during optimization.
  • Figure 3: Volume renderings from the defined radiance fields of Mesh2NeRF analytic solution. Our rendering results are very close to the ground truth, indicating that our defined radiance fields can serve as an effective ground truth representation for NeRFs.
  • Figure 4: Comparison of single scene fitting on the Country-Kitchen scene. Our results showcase higher accuracy and a superior ability to capture finer details in renderings when compared to the baseline method (Mesh2NeRF NGP vs Instant NGP).
  • Figure 5: Qualitative comparison of NeRF generation conditioned on single-view for unseen objects in ShapeNet Cars and Chairs between SSDNeRF and our method. Our approach enables more accurate novel views.
  • ...and 12 more figures