Meta 3D TextureGen: Fast and Consistent Texture Generation for 3D Objects
Raphael Bensadoun, Yanir Kleiman, Idan Azuri, Omri Harosh, Andrea Vedaldi, Natalia Neverova, Oran Gafni
TL;DR
Meta 3D TextureGen introduces a fast, two-stage diffusion framework that conditions texture generation on 3D geometry to achieve global consistency and high fidelity. Stage I produces multi-view 2D renders conditioned on position and normal maps, while Stage II backprojects and inpaints these views in UV space to generate a complete texture, with an optional 4K upsampling network. The approach yields strong text alignment, reduced seams (Janus artifacts), and superior quantitative and qualitative performance against prior texture-generation methods, all within a single forward-pass diffusion flow. The work enables practical texture authoring for diverse 3D assets, gaming, and VR/AR applications, while outlining limitations and ethical considerations.
Abstract
The recent availability and adaptability of text-to-image models has sparked a new era in many related domains that benefit from the learned text priors as well as high-quality and fast generation capabilities, one of which is texture generation for 3D objects. Although recent texture generation methods achieve impressive results by using text-to-image networks, the combination of global consistency, quality, and speed, which is crucial for advancing texture generation to real-world applications, remains elusive. To that end, we introduce Meta 3D TextureGen: a new feedforward method comprised of two sequential networks aimed at generating high-quality and globally consistent textures for arbitrary geometries of any complexity degree in less than 20 seconds. Our method achieves state-of-the-art results in quality and speed by conditioning a text-to-image model on 3D semantics in 2D space and fusing them into a complete and high-resolution UV texture map, as demonstrated by extensive qualitative and quantitative evaluations. In addition, we introduce a texture enhancement network that is capable of up-scaling any texture by an arbitrary ratio, producing 4k pixel resolution textures.
