Table of Contents
Fetching ...

DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance

Linxuan Xin, Zheng Zhang, Jinfu Wei, Wei Gao, Duan Gao

TL;DR

DreamPBR presents a diffusion-based framework for high-resolution SVBRDF generation guided by text and multi-modal inputs. A material Latent Diffusion Model encodes albedo into a latent representation and a render-aware decoder reconstructs full SVBRDF maps, enabling tileable outputs. The approach integrates Pixel Control, Style Control, and Shape Control to offer rich, user-friendly customization, and a rendering-aware super-resolution module boosts final texture quality. This combination delivers diverse, controllable, and realistic materials suitable for photorealistic rendering and editing tasks, with strong empirical support across tileable generation, inpainting, and exemplar-driven styling.

Abstract

Prior material creation methods had limitations in producing diverse results mainly because reconstruction-based methods relied on real-world measurements and generation-based methods were trained on relatively small material datasets. To address these challenges, we propose DreamPBR, a novel diffusion-based generative framework designed to create spatially-varying appearance properties guided by text and multi-modal controls, providing high controllability and diversity in material generation. Key to achieving diverse and high-quality PBR material generation lies in integrating the capabilities of recent large-scale vision-language models trained on billions of text-image pairs, along with material priors derived from hundreds of PBR material samples. We utilize a novel material Latent Diffusion Model (LDM) to establish the mapping between albedo maps and the corresponding latent space. The latent representation is then decoded into full SVBRDF parameter maps using a rendering-aware PBR decoder. Our method supports tileable generation through convolution with circular padding. Furthermore, we introduce a multi-modal guidance module, which includes pixel-aligned guidance, style image guidance, and 3D shape guidance, to enhance the control capabilities of the material LDM. We demonstrate the effectiveness of DreamPBR in material creation, showcasing its versatility and user-friendliness on a wide range of controllable generation and editing applications.

DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance

TL;DR

DreamPBR presents a diffusion-based framework for high-resolution SVBRDF generation guided by text and multi-modal inputs. A material Latent Diffusion Model encodes albedo into a latent representation and a render-aware decoder reconstructs full SVBRDF maps, enabling tileable outputs. The approach integrates Pixel Control, Style Control, and Shape Control to offer rich, user-friendly customization, and a rendering-aware super-resolution module boosts final texture quality. This combination delivers diverse, controllable, and realistic materials suitable for photorealistic rendering and editing tasks, with strong empirical support across tileable generation, inpainting, and exemplar-driven styling.

Abstract

Prior material creation methods had limitations in producing diverse results mainly because reconstruction-based methods relied on real-world measurements and generation-based methods were trained on relatively small material datasets. To address these challenges, we propose DreamPBR, a novel diffusion-based generative framework designed to create spatially-varying appearance properties guided by text and multi-modal controls, providing high controllability and diversity in material generation. Key to achieving diverse and high-quality PBR material generation lies in integrating the capabilities of recent large-scale vision-language models trained on billions of text-image pairs, along with material priors derived from hundreds of PBR material samples. We utilize a novel material Latent Diffusion Model (LDM) to establish the mapping between albedo maps and the corresponding latent space. The latent representation is then decoded into full SVBRDF parameter maps using a rendering-aware PBR decoder. Our method supports tileable generation through convolution with circular padding. Furthermore, we introduce a multi-modal guidance module, which includes pixel-aligned guidance, style image guidance, and 3D shape guidance, to enhance the control capabilities of the material LDM. We demonstrate the effectiveness of DreamPBR in material creation, showcasing its versatility and user-friendliness on a wide range of controllable generation and editing applications.
Paper Structure (30 sections, 3 equations, 16 figures)

This paper contains 30 sections, 3 equations, 16 figures.

Figures (16)

  • Figure 1: Overview of DreamPBR: The denoising UNet in our Material LDM is trained with only albedo textures (upper left) and a PBR Decoder with Highlight Aware Decoder is used to transform albedo textures to other physically-based textures (middle right). In the blue box on the left, we present three individual control modules: Pixel Control, Style Control, and Shape Control, whose results under controls are shown on the lower right. Besides, an additional Rendering-aware-super-resolution module is given for higher-quality textures (upper right).
  • Figure 2: The generation results of DreamPBR under text-only conditions: We randomly sampled numerous materials with various types and wide tags, by the prompts, "a PBR material of [type], [tags]". Not only can DreamPBR generate materials that match the descriptions, but also some out-of-domain materials are created as well such as brick of snow-covered bricks, plastic of a children's playground slide, and  wall of street art graffiti.
  • Figure 3: Pixel Control's results with the same pattern but different materials. The binary images in the first column are control conditions of different sketches and the generated materials are on their right with certain patterns same as our given images, following their material properties such as the edge of bricks and the growth rings of wood.
  • Figure 4: Pixel Control's results with the same material but different patterns. The first column shows the descriptions of materials and the control patterns are in the lower right corner of the rendering images. Like \ref{['fig:pixel_control_1']}, the results have also shown great consistency of materials and patterns.
  • Figure 5: Style Control's results with the same style but different materials. The styled images are given in the first column, and each description of the material is below the image, which provides users with more artistic ways to design textures.
  • ...and 11 more figures