Table of Contents
Fetching ...

MATLABER: Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR

Xudong Xu, Zhaoyang Lyu, Xingang Pan, Bo Dai

TL;DR

The paper addresses the difficulty of recovering high-fidelity object materials in text-to-3D generation. It introduces MATLABER, which employs a latent BRDF auto-encoder trained on real BRDF datasets to produce BRDF latent codes that decode to physically plausible BRDF parameters, enabling disentanglement from environment lights and relightable rendering. Appearance is modeled via a material MLP that predicts BRDF latents across a DMTet geometry, guided by Score Distillation Sampling and a Cook–Torrance BRDF framework, with a smooth latent space enforced by KL, smoothness, and cyclic losses. Empirical results show improved realism, detail, and disentanglement over baselines, plus successful relighting and material editing, illustrating the practical impact for 3D content creation. The work also provides a pathway to generalize material priors to downstream tasks, albeit with geometry-related limitations that future work could address with advanced diversification strategies like Variational Score Distillation.

Abstract

Based on powerful text-to-image diffusion models, text-to-3D generation has made significant progress in generating compelling geometry and appearance. However, existing methods still struggle to recover high-fidelity object materials, either only considering Lambertian reflectance, or failing to disentangle BRDF materials from the environment lights. In this work, we propose Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR (\textbf{MATLABER}) that leverages a novel latent BRDF auto-encoder for material generation. We train this auto-encoder with large-scale real-world BRDF collections and ensure the smoothness of its latent space, which implicitly acts as a natural distribution of materials. During appearance modeling in text-to-3D generation, the latent BRDF embeddings, rather than BRDF parameters, are predicted via a material network. Through exhaustive experiments, our approach demonstrates the superiority over existing ones in generating realistic and coherent object materials. Moreover, high-quality materials naturally enable multiple downstream tasks such as relighting and material editing. Code and model will be publicly available at \url{https://sheldontsui.github.io/projects/Matlaber}.

MATLABER: Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR

TL;DR

The paper addresses the difficulty of recovering high-fidelity object materials in text-to-3D generation. It introduces MATLABER, which employs a latent BRDF auto-encoder trained on real BRDF datasets to produce BRDF latent codes that decode to physically plausible BRDF parameters, enabling disentanglement from environment lights and relightable rendering. Appearance is modeled via a material MLP that predicts BRDF latents across a DMTet geometry, guided by Score Distillation Sampling and a Cook–Torrance BRDF framework, with a smooth latent space enforced by KL, smoothness, and cyclic losses. Empirical results show improved realism, detail, and disentanglement over baselines, plus successful relighting and material editing, illustrating the practical impact for 3D content creation. The work also provides a pathway to generalize material priors to downstream tasks, albeit with geometry-related limitations that future work could address with advanced diversification strategies like Variational Score Distillation.

Abstract

Based on powerful text-to-image diffusion models, text-to-3D generation has made significant progress in generating compelling geometry and appearance. However, existing methods still struggle to recover high-fidelity object materials, either only considering Lambertian reflectance, or failing to disentangle BRDF materials from the environment lights. In this work, we propose Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR (\textbf{MATLABER}) that leverages a novel latent BRDF auto-encoder for material generation. We train this auto-encoder with large-scale real-world BRDF collections and ensure the smoothness of its latent space, which implicitly acts as a natural distribution of materials. During appearance modeling in text-to-3D generation, the latent BRDF embeddings, rather than BRDF parameters, are predicted via a material network. Through exhaustive experiments, our approach demonstrates the superiority over existing ones in generating realistic and coherent object materials. Moreover, high-quality materials naturally enable multiple downstream tasks such as relighting and material editing. Code and model will be publicly available at \url{https://sheldontsui.github.io/projects/Matlaber}.
Paper Structure (12 sections, 10 equations, 7 figures, 1 table)

This paper contains 12 sections, 10 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Text-to-3D generation aims to synthesize high-quality 3D assets aligning with given text descriptions. Despite the impressive appearance, representative methods like DreamFusion poole2022dreamfusion and Fantasia3D chen2023fantasia3d still fail to recover high-fidelity object materials. Specifically, DreamFusion only considers diffuse materials while Fantasia3D always predicts BRDF materials entangled with environment lights. Based on a latent BRDF auto-encoder, our approach is capable of generating natural materials for 3D assets, enabling realistic renderings under different illuminations.
  • Figure 2: Left: Our latent BRDF auto-encoder is trained on the TwoShotBRDF dataset with four losses, i.e., reconstruction loss, KL divergence loss, smoothness loss, and cyclic loss. Imposing KL divergence and smoothness loss on latent embeddings encourages a smooth latent space boss2021neural. Right: Instead of predicting BRDF materials directly, we leverage a material MLP $\Gamma$ to generate latent BRDF code $\mathbf{z}$, which is then decoded to 7-dim BRDF parameters via our pretrained decoder. Similar to prior works, the SDS loss can be applied to the rendered images, which empowers the training of our material MLP network. (Note that, roughness $k_r$ is scalar and we visualize it with the green channel in this paper.)
  • Figure 3: The gallery of our text-to-3D generation results. Shapes, normal maps, and shaded images from two random viewpoints are presented here.
  • Figure 4: Qualitative comparisons to baselines. Our results have more natural textures and richer details.
  • Figure 5: Relighting results. On the left side, we list the generated BRDF materials, including diffuse, specular, and roughness. The relit images under a rotating environment light are presented on the right side.
  • ...and 2 more figures