Table of Contents
Fetching ...

MatFuse: Controllable Material Generation with Diffusion Models

Giuseppe Vecchio, Renato Sortino, Simone Palazzo, Concetto Spampinato

TL;DR

MatFuse is introduced, a unified approach that harnesses the gener-ative power of diffusion models for creation and editing of 3D materials and integrates multiple sources of conditioning, including color palettes, sketches, text, and pictures, enhancing creative possibilities and granting fine-grained control over material synthesis.

Abstract

Creating high-quality materials in computer graphics is a challenging and time-consuming task, which requires great expertise. To simplify this process, we introduce MatFuse, a unified approach that harnesses the generative power of diffusion models for creation and editing of 3D materials. Our method integrates multiple sources of conditioning, including color palettes, sketches, text, and pictures, enhancing creative possibilities and granting fine-grained control over material synthesis. Additionally, MatFuse enables map-level material editing capabilities through latent manipulation by means of a multi-encoder compression model which learns a disentangled latent representation for each map. We demonstrate the effectiveness of MatFuse under multiple conditioning settings and explore the potential of material editing. Finally, we assess the quality of the generated materials both quantitatively in terms of CLIP-IQA and FID scores and qualitatively by conducting a user study. Source code for training MatFuse and supplemental materials are publicly available at https://gvecchio.com/matfuse.

MatFuse: Controllable Material Generation with Diffusion Models

TL;DR

MatFuse is introduced, a unified approach that harnesses the gener-ative power of diffusion models for creation and editing of 3D materials and integrates multiple sources of conditioning, including color palettes, sketches, text, and pictures, enhancing creative possibilities and granting fine-grained control over material synthesis.

Abstract

Creating high-quality materials in computer graphics is a challenging and time-consuming task, which requires great expertise. To simplify this process, we introduce MatFuse, a unified approach that harnesses the generative power of diffusion models for creation and editing of 3D materials. Our method integrates multiple sources of conditioning, including color palettes, sketches, text, and pictures, enhancing creative possibilities and granting fine-grained control over material synthesis. Additionally, MatFuse enables map-level material editing capabilities through latent manipulation by means of a multi-encoder compression model which learns a disentangled latent representation for each map. We demonstrate the effectiveness of MatFuse under multiple conditioning settings and explore the potential of material editing. Finally, we assess the quality of the generated materials both quantitatively in terms of CLIP-IQA and FID scores and qualitatively by conducting a user study. Source code for training MatFuse and supplemental materials are publicly available at https://gvecchio.com/matfuse.
Paper Structure (19 sections, 2 equations, 7 figures, 3 tables)

This paper contains 19 sections, 2 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Overview of the MatFuse framework: At training time, VQ-GAN encoder $\mathcal{E}$ projects data from the pixel space to a more compact latent embedding $z$; the diffusion process runs on this latent space; conditioning is carried out through cross-attention for global conditions (red in figure), and through concatenation with the noise for local conditions (blue in figure); the output maps are finally obtained by projecting the conditioned reconstructed latent space $\hat{z}$ back into the pixel space through VQ-GAN decoder $\mathcal{D}$.
  • Figure 2: Overview of the compression model architecture. Reflectance maps (diffuse, normal, roughness, and specular) are fed to the encoders. Features extracted for each map are quantized and concatenated before being passed to the decoder, which reconstructs the original maps.
  • Figure 3: Globally conditioned material generation. We evaluate MatFuse when guided with single conditions. First two rows: text-conditioned map generation; mid two rows: image-prompted generation, yielding maps with features of the input image; last two rows: palette-conditioned generation.
  • Figure 4: Locally conditioned material generation. We provide sketches to condition MatFuse and produce maps with well-defined edges. The first two rows present hand-drawn sketches, while the latter is obtained from a material picture. This shows the robustness of MatFuse in handling both clean and noisy sketches.
  • Figure 5: Multimodal conditioned material generation. First row: text prompt + sketch. Second row: image prompt + sketch. Third row: color palette + sketch.
  • ...and 2 more figures