Table of Contents
Fetching ...

MCMat: Multiview-Consistent and Physically Accurate PBR Material Generation

Shenhao Zhu, Lingteng Qiu, Xiaodong Gu, Zhengyi Zhao, Chao Xu, Yuxiao He, Zhe Li, Xiaoguang Han, Yao Yao, Xun Cao, Siyu Zhu, Weihao Yuan, Zilong Dong, Hao Zhu

TL;DR

This work introduces MCMat, a two-stage framework for multi-view-consistent and physically accurate PBR material generation for 3D models conditioned on text or reference images. The generation stage employs MG-DiT, a multi-view diffusion transformer with geometric conditioning from surface normals and a reference-based block to ensure cross-view consistency and fidelity to references, guided by a PBR-based diffusion loss. The refinement stage uses MR-DiT to convert incomplete, low-resolution multi-view outputs into high-quality 2K UV-space textures through inpainting and detail enhancement, leveraging a coarse texture map and normal cues. Experiments on a large 3D model dataset demonstrate state-of-the-art performance in text-to-PBR material generation and relighting, with significant improvements in realism, fidelity, and generalization under varying lighting conditions.

Abstract

Existing 2D methods utilize UNet-based diffusion models to generate multi-view physically-based rendering (PBR) maps but struggle with multi-view inconsistency, while some 3D methods directly generate UV maps, encountering generalization issues due to the limited 3D data. To address these problems, we propose a two-stage approach, including multi-view generation and UV materials refinement. In the generation stage, we adopt a Diffusion Transformer (DiT) model to generate PBR materials, where both the specially designed multi-branch DiT and reference-based DiT blocks adopt a global attention mechanism to promote feature interaction and fusion between different views, thereby improving multi-view consistency. In addition, we adopt a PBR-based diffusion loss to ensure that the generated materials align with realistic physical principles. In the refinement stage, we propose a material-refined DiT that performs inpainting in empty areas and enhances details in UV space. Except for the normal condition, this refinement also takes the material map from the generation stage as an additional condition to reduce the learning difficulty and improve generalization. Extensive experiments show that our method achieves state-of-the-art performance in texturing 3D objects with PBR materials and provides significant advantages for graphics relighting applications. Project Page: https://lingtengqiu.github.io/2024/MCMat/

MCMat: Multiview-Consistent and Physically Accurate PBR Material Generation

TL;DR

This work introduces MCMat, a two-stage framework for multi-view-consistent and physically accurate PBR material generation for 3D models conditioned on text or reference images. The generation stage employs MG-DiT, a multi-view diffusion transformer with geometric conditioning from surface normals and a reference-based block to ensure cross-view consistency and fidelity to references, guided by a PBR-based diffusion loss. The refinement stage uses MR-DiT to convert incomplete, low-resolution multi-view outputs into high-quality 2K UV-space textures through inpainting and detail enhancement, leveraging a coarse texture map and normal cues. Experiments on a large 3D model dataset demonstrate state-of-the-art performance in text-to-PBR material generation and relighting, with significant improvements in realism, fidelity, and generalization under varying lighting conditions.

Abstract

Existing 2D methods utilize UNet-based diffusion models to generate multi-view physically-based rendering (PBR) maps but struggle with multi-view inconsistency, while some 3D methods directly generate UV maps, encountering generalization issues due to the limited 3D data. To address these problems, we propose a two-stage approach, including multi-view generation and UV materials refinement. In the generation stage, we adopt a Diffusion Transformer (DiT) model to generate PBR materials, where both the specially designed multi-branch DiT and reference-based DiT blocks adopt a global attention mechanism to promote feature interaction and fusion between different views, thereby improving multi-view consistency. In addition, we adopt a PBR-based diffusion loss to ensure that the generated materials align with realistic physical principles. In the refinement stage, we propose a material-refined DiT that performs inpainting in empty areas and enhances details in UV space. Except for the normal condition, this refinement also takes the material map from the generation stage as an additional condition to reduce the learning difficulty and improve generalization. Extensive experiments show that our method achieves state-of-the-art performance in texturing 3D objects with PBR materials and provides significant advantages for graphics relighting applications. Project Page: https://lingtengqiu.github.io/2024/MCMat/

Paper Structure

This paper contains 30 sections, 16 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: A gallery of generated textured meshes. Our method effectively produces high-quality, lighting-independent, and highly faithful PBR materials across a wide range of objects from various categories, achieving highly realistic rendering results.
  • Figure 2: Our method consists of a generation stage and a refinement stage. In the generation stage, the Multi-View Generation DiT (MG-DiT) model utilizes surface normal information from the 3D model as geometric conditions, reference images, and textual descriptions to generate multi-view-consistent PBR material properties. In the refinement stage, the Material Refinement DiT (MR-DiT) model performs inpainting in void regions and enhances details in UV space, ultimately producing high-quality 2K resolution textures with precise material information.
  • Figure 3: Structure of our Multi-View Generation DiT block.
  • Figure 4: Qualitative comparisons on PBR material generation conditioned on text prompt.
  • Figure 5: Qualitative comparisons on Relighting results
  • ...and 8 more figures