Table of Contents
Fetching ...

Make-A-Texture: Fast Shape-Aware Texture Generation in 3 Seconds

Xiaoyu Xiang, Liat Sless Gorelik, Yuchen Fan, Omri Armstrong, Forrest Iandola, Yilei Li, Ita Lifshitz, Rakesh Ranjan

TL;DR

Make-A-Textureaddresses the need for rapid, multiview-consistent texture synthesis for arbitrary 3D geometries from text prompts. It combines a depth-conditioned, diffusion-based inpainting model with geometry-aware multiview generation, automatic view selection, and a fast backprojection pipeline to achieve about 3 seconds end-to-end on a single H100 GPU. Key contributions include front-back simultaneous generation for global consistency, robust filtering to suppress artifacts on open-surface meshes, and substantial speedups over prior diffusion-based methods while maintaining high texture quality. The approach enables interactive texture editing, rapid asset prototyping, and theming across asset collections, broadening access to high-quality 3D textures in games, VR, and related workflows.

Abstract

We present Make-A-Texture, a new framework that efficiently synthesizes high-resolution texture maps from textual prompts for given 3D geometries. Our approach progressively generates textures that are consistent across multiple viewpoints with a depth-aware inpainting diffusion model, in an optimized sequence of viewpoints determined by an automatic view selection algorithm. A significant feature of our method is its remarkable efficiency, achieving a full texture generation within an end-to-end runtime of just 3.07 seconds on a single NVIDIA H100 GPU, significantly outperforming existing methods. Such an acceleration is achieved by optimizations in the diffusion model and a specialized backprojection method. Moreover, our method reduces the artifacts in the backprojection phase, by selectively masking out non-frontal faces, and internal faces of open-surfaced objects. Experimental results demonstrate that Make-A-Texture matches or exceeds the quality of other state-of-the-art methods. Our work significantly improves the applicability and practicality of texture generation models for real-world 3D content creation, including interactive creation and text-guided texture editing.

Make-A-Texture: Fast Shape-Aware Texture Generation in 3 Seconds

TL;DR

Make-A-Textureaddresses the need for rapid, multiview-consistent texture synthesis for arbitrary 3D geometries from text prompts. It combines a depth-conditioned, diffusion-based inpainting model with geometry-aware multiview generation, automatic view selection, and a fast backprojection pipeline to achieve about 3 seconds end-to-end on a single H100 GPU. Key contributions include front-back simultaneous generation for global consistency, robust filtering to suppress artifacts on open-surface meshes, and substantial speedups over prior diffusion-based methods while maintaining high texture quality. The approach enables interactive texture editing, rapid asset prototyping, and theming across asset collections, broadening access to high-quality 3D textures in games, VR, and related workflows.

Abstract

We present Make-A-Texture, a new framework that efficiently synthesizes high-resolution texture maps from textual prompts for given 3D geometries. Our approach progressively generates textures that are consistent across multiple viewpoints with a depth-aware inpainting diffusion model, in an optimized sequence of viewpoints determined by an automatic view selection algorithm. A significant feature of our method is its remarkable efficiency, achieving a full texture generation within an end-to-end runtime of just 3.07 seconds on a single NVIDIA H100 GPU, significantly outperforming existing methods. Such an acceleration is achieved by optimizations in the diffusion model and a specialized backprojection method. Moreover, our method reduces the artifacts in the backprojection phase, by selectively masking out non-frontal faces, and internal faces of open-surfaced objects. Experimental results demonstrate that Make-A-Texture matches or exceeds the quality of other state-of-the-art methods. Our work significantly improves the applicability and practicality of texture generation models for real-world 3D content creation, including interactive creation and text-guided texture editing.

Paper Structure

This paper contains 23 sections, 4 equations, 10 figures, 3 tables, 1 algorithm.

Figures (10)

  • Figure 1: Method overview. The texture is generated iteratively from different viewpoints using a pretrained diffusion model. At the 1st stage, we generate the front and back view together for better global consistency. In following stages, the output RGB is conditioned on both geometry and the existing textures via inpainting. The generated images are backprojected to the mesh surface to for the next stage.
  • Figure 2: Filtering out non-frontal-facing regions. From the input depth, we can derive normals and a binary frontal-facing mask by thresholding. This mask guides the backprojection process. Thus, areas like the side or seam of the jacket are not mapped to the output texture, as shown in the rightmost image.
  • Figure 3: Example of different shape representations.
  • Figure 4: Artifact illustration without rendering internal faces
  • Figure 5: With the new mask computed from the differences of (a) and (b), the artifacts from the internal faces can be avoided in backprojection.
  • ...and 5 more figures