MatLat: Material Latent Space for PBR Texture Generation
Kyeongmin Yeo, Yunhong Min, Jaihoon Kim, Minhyuk Sung
TL;DR
MatLat tackles the scarcity of high-quality PBR textures by leveraging pretrained diffusion priors through a latent-space adaptation approach. It introduces MatVAE, which uses residual prediction and KL regularization to extend a pretrained VAE to 5-channel PBR inputs, enabling stable diffusion fine-tuning in the adapted latent space. To ensure multi-view consistency, the method combines correspondence-aware attention with a locality regularizer that preserves latent–image spatial alignment. Comprehensive ablations and comparisons demonstrate state-of-the-art performance in PBR texture fidelity and cross-view coherence across diverse materials.
Abstract
We propose a generative framework for producing high-quality PBR textures on a given 3D mesh. As large-scale PBR texture datasets are scarce, our approach focuses on effectively leveraging the embedding space and diffusion priors of pretrained latent image generative models while learning a material latent space, MatLat, through targeted fine-tuning. Unlike prior methods that freeze the embedding network and thus lead to distribution shifts when encoding additional PBR channels and hinder subsequent diffusion training, we fine-tune the pretrained VAE so that new material channels can be incorporated with minimal latent distribution deviation. We further show that correspondence-aware attention alone is insufficient for cross-view consistency unless the latent-to-image mapping preserves locality. To enforce this locality, we introduce a regularization in the VAE fine-tuning that crops latent patches, decodes them, and aligns the corresponding image regions to maintain strong pixel-latent spatial correspondence. Ablation studies and comparison with previous baselines demonstrate that our framework improves PBR texture fidelity and that each component is critical for achieving state-of-the-art performance.
