MatPedia: A Universal Generative Foundation for High-Fidelity Material Synthesis
Di Luo, Shuhui Yang, Mingxin Yang, Jiawei Lu, Yixuan Tang, Xintong Han, Zhuo Chen, Beibei Wang, Chunchao Guo
TL;DR
MatPedia addresses the lack of a unified representation for RGB appearance and PBR properties by introducing a joint RGB-PBR latent space learned from a five-frame input. A video diffusion transformer, initialized from large-scale RGB-video priors and fine-tuned with LoRA, enables text-to-material, image-to-material, and intrinsic decomposition at high resolution. The hybrid MatHybrid-410K dataset combines RGB appearance data with PBR materials to leverage abundant RGB data for improving PBR synthesis. The results show improved quality and diversity over task-specific baselines, demonstrating a scalable foundation for realistic material generation in 3D assets.
Abstract
Physically-based rendering (PBR) materials are fundamental to photorealistic graphics, yet their creation remains labor-intensive and requires specialized expertise. While generative models have advanced material synthesis, existing methods lack a unified representation bridging natural image appearance and PBR properties, leading to fragmented task-specific pipelines and inability to leverage large-scale RGB image data. We present MatPedia, a foundation model built upon a novel joint RGB-PBR representation that compactly encodes materials into two interdependent latents: one for RGB appearance and one for the four PBR maps encoding complementary physical properties. By formulating them as a 5-frame sequence and employing video diffusion architectures, MatPedia naturally captures their correlations while transferring visual priors from RGB generation models. This joint representation enables a unified framework handling multiple material tasks--text-to-material generation, image-to-material generation, and intrinsic decomposition--within a single architecture. Trained on MatHybrid-410K, a mixed corpus combining PBR datasets with large-scale RGB images, MatPedia achieves native $1024\times1024$ synthesis that substantially surpasses existing approaches in both quality and diversity.
