Table of Contents
Fetching ...

PBR3DGen: A VLM-guided Mesh Generation with High-quality PBR Texture

Xiaokang Wei, Bowen Zhang, Xianghui Yang, Yuxuan Wang, Chunchao Guo, Xi Zhao, Yan Luximon

TL;DR

PBR3DGen tackles the entanglement of lighting and material properties in 3D asset generation by proposing a two-stage framework that first estimates multi-view PBR textures guided by vision-language models and view-aware illumination priors, then reconstructs a high-quality 3D mesh with PBR maps using a dual-head architecture. The VLM-guided diffusion stage mitigates albedo-highlights ambiguity and enables spatially varying metalness and roughness, while the reconstruction stage produces relightable geometry via a NeUS-based representation. The approach achieves state-of-the-art results on PBR estimation and 3D mesh quality on Objaverse-derived data and generalizes to novel-view and real-world settings, with robust relighting capabilities. This work advances practical, photorealistic 3D asset synthesis from a single image or text prompt and offers a scalable pipeline for physically-based rendering-ready models.

Abstract

Generating high-quality physically based rendering (PBR) materials is important to achieve realistic rendering in the downstream tasks, yet it remains challenging due to the intertwined effects of materials and lighting. While existing methods have made breakthroughs by incorporating material decomposition in the 3D generation pipeline, they tend to bake highlights into albedo and ignore spatially varying properties of metallicity and roughness. In this work, we present PBR3DGen, a two-stage mesh generation method with high-quality PBR materials that integrates the novel multi-view PBR material estimation model and a 3D PBR mesh reconstruction model. Specifically, PBR3DGen leverages vision language models (VLM) to guide multi-view diffusion, precisely capturing the spatial distribution and inherent attributes of reflective-metalness material. Additionally, we incorporate view-dependent illumination-aware conditions as pixel-aware priors to enhance spatially varying material properties. Furthermore, our reconstruction model reconstructs high-quality mesh with PBR materials. Experimental results demonstrate that PBR3DGen significantly outperforms existing methods, achieving new state-of-the-art results for PBR estimation and mesh generation. More results and visualization can be found on our project page: https://pbr3dgen1218.github.io/.

PBR3DGen: A VLM-guided Mesh Generation with High-quality PBR Texture

TL;DR

PBR3DGen tackles the entanglement of lighting and material properties in 3D asset generation by proposing a two-stage framework that first estimates multi-view PBR textures guided by vision-language models and view-aware illumination priors, then reconstructs a high-quality 3D mesh with PBR maps using a dual-head architecture. The VLM-guided diffusion stage mitigates albedo-highlights ambiguity and enables spatially varying metalness and roughness, while the reconstruction stage produces relightable geometry via a NeUS-based representation. The approach achieves state-of-the-art results on PBR estimation and 3D mesh quality on Objaverse-derived data and generalizes to novel-view and real-world settings, with robust relighting capabilities. This work advances practical, photorealistic 3D asset synthesis from a single image or text prompt and offers a scalable pipeline for physically-based rendering-ready models.

Abstract

Generating high-quality physically based rendering (PBR) materials is important to achieve realistic rendering in the downstream tasks, yet it remains challenging due to the intertwined effects of materials and lighting. While existing methods have made breakthroughs by incorporating material decomposition in the 3D generation pipeline, they tend to bake highlights into albedo and ignore spatially varying properties of metallicity and roughness. In this work, we present PBR3DGen, a two-stage mesh generation method with high-quality PBR materials that integrates the novel multi-view PBR material estimation model and a 3D PBR mesh reconstruction model. Specifically, PBR3DGen leverages vision language models (VLM) to guide multi-view diffusion, precisely capturing the spatial distribution and inherent attributes of reflective-metalness material. Additionally, we incorporate view-dependent illumination-aware conditions as pixel-aware priors to enhance spatially varying material properties. Furthermore, our reconstruction model reconstructs high-quality mesh with PBR materials. Experimental results demonstrate that PBR3DGen significantly outperforms existing methods, achieving new state-of-the-art results for PBR estimation and mesh generation. More results and visualization can be found on our project page: https://pbr3dgen1218.github.io/.

Paper Structure

This paper contains 30 sections, 14 equations, 18 figures, 6 tables.

Figures (18)

  • Figure 1: We present PBR3DGen, a novel two-stage 3D assets generation framework with high-quality physically-based rendering materials. All objects in the scene are generated from PBR3DGen.
  • Figure 2: Method overview. Our method consists of two stages: Multi-view PBR materials estimation and 3D mesh with PBR materials reconstruction. Given an RGB image as input, we first generate multi-view Albedo images and multi-view MRO images using Multi-view PBR estimation model, and then reconstruct 3D assets with Dual-head PBR-based large reconstruction model.
  • Figure 3: Qualitative comparison of the generated 3D assets with other methods.
  • Figure 4: Text/Image to 3D results of our method.
  • Figure 5: Qualitative comparison with IntrinsicAnything chen2024intrinsicanything.
  • ...and 13 more figures