Table of Contents
Fetching ...

MatLat: Material Latent Space for PBR Texture Generation

Kyeongmin Yeo, Yunhong Min, Jaihoon Kim, Minhyuk Sung

TL;DR

MatLat tackles the scarcity of high-quality PBR textures by leveraging pretrained diffusion priors through a latent-space adaptation approach. It introduces MatVAE, which uses residual prediction and KL regularization to extend a pretrained VAE to 5-channel PBR inputs, enabling stable diffusion fine-tuning in the adapted latent space. To ensure multi-view consistency, the method combines correspondence-aware attention with a locality regularizer that preserves latent–image spatial alignment. Comprehensive ablations and comparisons demonstrate state-of-the-art performance in PBR texture fidelity and cross-view coherence across diverse materials.

Abstract

We propose a generative framework for producing high-quality PBR textures on a given 3D mesh. As large-scale PBR texture datasets are scarce, our approach focuses on effectively leveraging the embedding space and diffusion priors of pretrained latent image generative models while learning a material latent space, MatLat, through targeted fine-tuning. Unlike prior methods that freeze the embedding network and thus lead to distribution shifts when encoding additional PBR channels and hinder subsequent diffusion training, we fine-tune the pretrained VAE so that new material channels can be incorporated with minimal latent distribution deviation. We further show that correspondence-aware attention alone is insufficient for cross-view consistency unless the latent-to-image mapping preserves locality. To enforce this locality, we introduce a regularization in the VAE fine-tuning that crops latent patches, decodes them, and aligns the corresponding image regions to maintain strong pixel-latent spatial correspondence. Ablation studies and comparison with previous baselines demonstrate that our framework improves PBR texture fidelity and that each component is critical for achieving state-of-the-art performance.

MatLat: Material Latent Space for PBR Texture Generation

TL;DR

MatLat tackles the scarcity of high-quality PBR textures by leveraging pretrained diffusion priors through a latent-space adaptation approach. It introduces MatVAE, which uses residual prediction and KL regularization to extend a pretrained VAE to 5-channel PBR inputs, enabling stable diffusion fine-tuning in the adapted latent space. To ensure multi-view consistency, the method combines correspondence-aware attention with a locality regularizer that preserves latent–image spatial alignment. Comprehensive ablations and comparisons demonstrate state-of-the-art performance in PBR texture fidelity and cross-view coherence across diverse materials.

Abstract

We propose a generative framework for producing high-quality PBR textures on a given 3D mesh. As large-scale PBR texture datasets are scarce, our approach focuses on effectively leveraging the embedding space and diffusion priors of pretrained latent image generative models while learning a material latent space, MatLat, through targeted fine-tuning. Unlike prior methods that freeze the embedding network and thus lead to distribution shifts when encoding additional PBR channels and hinder subsequent diffusion training, we fine-tune the pretrained VAE so that new material channels can be incorporated with minimal latent distribution deviation. We further show that correspondence-aware attention alone is insufficient for cross-view consistency unless the latent-to-image mapping preserves locality. To enforce this locality, we introduce a regularization in the VAE fine-tuning that crops latent patches, decodes them, and aligns the corresponding image regions to maintain strong pixel-latent spatial correspondence. Ablation studies and comparison with previous baselines demonstrate that our framework improves PBR texture fidelity and that each component is critical for achieving state-of-the-art performance.

Paper Structure

This paper contains 44 sections, 11 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: PBR Textures Generated by MatLat. Our method produces PBR textures that accurately represent rough materials (left), metallic surfaces (middle), and complex mixed materials (right).
  • Figure 2: Overview of Frozen VAE He:2025materialmvp. Zero-padded roughness–metallic maps and albedo image are encoded by the frozen VAE to produce $(\boldsymbol{\mu}_{\text{rm}}, \boldsymbol{\sigma}_{\text{rm}})$ and $(\boldsymbol{\mu}_{\text{a}}, \boldsymbol{\sigma}_{\text{a}})$, respectively.
  • Figure 3: Comparison of PBR Material Encoder Schemes. (a) LayerDiffuse Zhang:2024layerdiffuse uses residual prediction but only predicts the latent mean and enforces identity consistency via $\mathcal{L}_{\text{id}}$. (b) Orchid Krishnan:2025orchid uses direct prediction to output the full latent parameters $(\boldsymbol{\mu}_{\text{full}}, \boldsymbol{\sigma}_{\text{full}})$ and regularizes them with $\mathcal{L}_{\text{reg}}$. (c) MatVAE (ours) synergistically integrates residual prediction and KL regularization, enabling effective incorporation of PBR material images while preserving the pretrained latent space.
  • Figure 4: Illustration of Correspondence-Aware Attention (CAA) and Locality Regularization. (a) CAA restricts attention to geometrically corresponding tokens across views. (b) Locality regularizer enforces patch-wise reconstruction such that image pixels are decoded from spatially aligned latent tokens. Together, CAA and locality regularization enable multi-view consistent PBR texture generation.
  • Figure 5: Qualitative Comparisons across Each Components. Combining residual prediction with $\mathcal{L}_\text{reg}$ effectively preserves the pretrained latent distribution and yields high-quality PBR textures.
  • ...and 3 more figures