HiMat: DiT-based Ultra-High Resolution SVBRDF Generation
Zixiong Wang, Jian Yang, Yiwei Hu, Milos Hasan, Beibei Wang
TL;DR
HiMat introduces a memory-efficient diffusion framework for native 4K SVBRDF generation by combining a deep compression autoencoder (DC-AE) with a linear-attention diffusion transformer (DiT) and a lightweight CrossStitch module to enforce cross-map consistency. It addresses two core challenges: (1) generating multiple 4K reflectance maps with a reduced pixel budget and (2) maintaining pixel-perfect alignment across maps without costly global attention. The approach also enriches material diversity through prompt augmentation with large-language models and system prompts, leveraging strong priors from pretrained models. Empirical results show HiMat delivers high-fidelity, diverse 4K SVBRDFs with practical runtimes on consumer GPUs and generalizes to intrinsic decomposition tasks, establishing a scalable foundation for high-resolution material generation.
Abstract
Creating ultra-high-resolution spatially varying bidirectional reflectance functions (SVBRDFs) is critical for photorealistic 3D content creation, to faithfully represent fine-scale surface details required for close-up rendering. However, achieving 4K generation faces two key challenges: (1) the need to synthesize multiple reflectance maps at full resolution, which multiplies the pixel budget and imposes prohibitive memory and computational cost, and (2) the requirement to maintain strong pixel-level alignment across maps at 4K, which is particularly difficult when adapting pretrained models designed for the RGB image domain. We introduce HiMat, a diffusion-based framework tailored for efficient and diverse 4K SVBRDF generation. To address the first challenge, HiMat performs generation in a high-compression latent space via DC-AE, and employs a pretrained diffusion transformer with linear attention to improve per-map efficiency. To address the second challenge, we propose CrossStitch, a lightweight convolutional module that enforces cross-map consistency without incurring the cost of global attention. Our experiments show that HiMat achieves high-fidelity 4K SVBRDF generation with superior efficiency, structural consistency, and diversity compared to prior methods. Beyond materials, our framework also generalizes to related applications such as intrinsic decomposition.
