Table of Contents
Fetching ...

Meta 3D TextureGen: Fast and Consistent Texture Generation for 3D Objects

Raphael Bensadoun, Yanir Kleiman, Idan Azuri, Omri Harosh, Andrea Vedaldi, Natalia Neverova, Oran Gafni

TL;DR

Meta 3D TextureGen introduces a fast, two-stage diffusion framework that conditions texture generation on 3D geometry to achieve global consistency and high fidelity. Stage I produces multi-view 2D renders conditioned on position and normal maps, while Stage II backprojects and inpaints these views in UV space to generate a complete texture, with an optional 4K upsampling network. The approach yields strong text alignment, reduced seams (Janus artifacts), and superior quantitative and qualitative performance against prior texture-generation methods, all within a single forward-pass diffusion flow. The work enables practical texture authoring for diverse 3D assets, gaming, and VR/AR applications, while outlining limitations and ethical considerations.

Abstract

The recent availability and adaptability of text-to-image models has sparked a new era in many related domains that benefit from the learned text priors as well as high-quality and fast generation capabilities, one of which is texture generation for 3D objects. Although recent texture generation methods achieve impressive results by using text-to-image networks, the combination of global consistency, quality, and speed, which is crucial for advancing texture generation to real-world applications, remains elusive. To that end, we introduce Meta 3D TextureGen: a new feedforward method comprised of two sequential networks aimed at generating high-quality and globally consistent textures for arbitrary geometries of any complexity degree in less than 20 seconds. Our method achieves state-of-the-art results in quality and speed by conditioning a text-to-image model on 3D semantics in 2D space and fusing them into a complete and high-resolution UV texture map, as demonstrated by extensive qualitative and quantitative evaluations. In addition, we introduce a texture enhancement network that is capable of up-scaling any texture by an arbitrary ratio, producing 4k pixel resolution textures.

Meta 3D TextureGen: Fast and Consistent Texture Generation for 3D Objects

TL;DR

Meta 3D TextureGen introduces a fast, two-stage diffusion framework that conditions texture generation on 3D geometry to achieve global consistency and high fidelity. Stage I produces multi-view 2D renders conditioned on position and normal maps, while Stage II backprojects and inpaints these views in UV space to generate a complete texture, with an optional 4K upsampling network. The approach yields strong text alignment, reduced seams (Janus artifacts), and superior quantitative and qualitative performance against prior texture-generation methods, all within a single forward-pass diffusion flow. The work enables practical texture authoring for diverse 3D assets, gaming, and VR/AR applications, while outlining limitations and ethical considerations.

Abstract

The recent availability and adaptability of text-to-image models has sparked a new era in many related domains that benefit from the learned text priors as well as high-quality and fast generation capabilities, one of which is texture generation for 3D objects. Although recent texture generation methods achieve impressive results by using text-to-image networks, the combination of global consistency, quality, and speed, which is crucial for advancing texture generation to real-world applications, remains elusive. To that end, we introduce Meta 3D TextureGen: a new feedforward method comprised of two sequential networks aimed at generating high-quality and globally consistent textures for arbitrary geometries of any complexity degree in less than 20 seconds. Our method achieves state-of-the-art results in quality and speed by conditioning a text-to-image model on 3D semantics in 2D space and fusing them into a complete and high-resolution UV texture map, as demonstrated by extensive qualitative and quantitative evaluations. In addition, we introduce a texture enhancement network that is capable of up-scaling any texture by an arbitrary ratio, producing 4k pixel resolution textures.
Paper Structure (37 sections, 3 equations, 20 figures, 9 tables)

This paper contains 37 sections, 3 equations, 20 figures, 9 tables.

Figures (20)

  • Figure 1: Meta 3D TextureGen: examples of generated textures. Given a 3D shape and a textual prompt, our method generates globally consistent, high-quality textures in under $20$ seconds, while maintaining text faithfulness for both realistic and stylized text prompts.
  • Figure 2: Method overview. Given an input shape and a text prompt, Meta 3D TextureGen generates a globally consistent high-quality texture in less than $20$ seconds. The first stage (left) consists of a geometry-aware text-to-image model that generates a multi-view image of the generated texture, conditioned on renders of the normal and position maps over the input mesh. The second stage (right) consists of a projection of the generated texture renders back to UV space while taking into account the normals and camera angles (weighted incidence). The combined backprojections are then fed into the UV-space inpainting network along with a guiding inpainting mask, as well as the vertex and position UV maps, which generates a complete texture map in UV space. The generated texture map can optionally go through a MultiDiffusion texture enhancement network to increase the resolution by an arbirary ratio.
  • Figure 3: Contrary to (a) depth renders, (b) position renders are global rather than view-dependent, and (c) normal renders contain high-frequency details.
  • Figure 4: Qualitative comparison with previous work (local consistency, quality and text alignment). Compared with previous work, our method results in higher-quality textures, while preserving local consistency and adhering to the text prompt.
  • Figure 5: Qualitative comparison with previous work (global consistency, quality and text alignment). While previous methods result in global inconsistencies such as the Janus effect (blue rectangles), as well as text mis-alignments, our method returns a globally consistent and highly text-aligned textures. Text prompts: (i) top-left:"a bunny made out of small pebbles of many shades of gray", (ii) top-right:"a realistic white rabbit with long fur, pink eyes, and black paws", (iii) bottom-left:"a sand sculpture of a bunny with engraving of an intricate pattern", (iv) bottom-right:"a bunny with a velvet purple coat with intricate gold embroidery along the edges".
  • ...and 15 more figures