Infinite Texture: Text-guided High Resolution Diffusion Texture Synthesis

Yifan Wang; Aleksander Holynski; Brian L. Curless; Steven M. Seitz

Infinite Texture: Text-guided High Resolution Diffusion Texture Synthesis

Yifan Wang, Aleksander Holynski, Brian L. Curless, Steven M. Seitz

TL;DR

Infinite Texture addresses the challenge of generating high-resolution, diverse textures guided by natural language. It introduces a three-stage pipeline: (1) generate a 1024×1024 reference texture from a text prompt using a diffusion-based T2I model, (2) fine-tune a diffusion model per texture to learn its statistics from patches with a unique identifier, and (3) synthesize arbitrarily large textures by patch-wise denoising with score aggregation on a single GPU. The approach yields textures that preserve reference statistics while permitting stochastic variations, demonstrated through 3D rendering and texture transfer, and shows competitive realism in human studies compared with traditional baselines. While effective and GPU-friendly, the method inherits diffusion-model limitations such as color drift and directional bias, suggesting future work in mitigating lighting artifacts and improving handling of anisotropic textures.

Abstract

We present Infinite Texture, a method for generating arbitrarily large texture images from a text prompt. Our approach fine-tunes a diffusion model on a single texture, and learns to embed that statistical distribution in the output domain of the model. We seed this fine-tuning process with a sample texture patch, which can be optionally generated from a text-to-image model like DALL-E 2. At generation time, our fine-tuned diffusion model is used through a score aggregation strategy to generate output texture images of arbitrary resolution on a single GPU. We compare synthesized textures from our method to existing work in patch-based and deep learning texture synthesis methods. We also showcase two applications of our generated textures in 3D rendering and texture transfer.

Infinite Texture: Text-guided High Resolution Diffusion Texture Synthesis

TL;DR

Abstract

Paper Structure (15 sections, 3 equations, 7 figures)

This paper contains 15 sections, 3 equations, 7 figures.

Introduction
Related Work
Texture synthesis
Text-to-image synthesis
Method
Diffusion Models
Per-Texture Fine-tuning
Texture Synthesis
Evaluation
Ablation Studies
Baseline Comparisons
Applications
Texture Generation and Application to 3D Models
Texture Transfer
Conclusion and Discussions

Figures (7)

Figure 1: Infinite Texture generates arbitrarily large (examples here are 85MP), high-quality textures from text prompts. Given a single text prompt as input, Infinite Texture generates a diverse collection of textures (leather on the top row). Our method succeeds in reproducing both statistically periodic textures (fabric on the bottom row) and challenging ones with depth variations (honeycomb on the bottom row). Close-up views of the samples are depicted within the white boxes.
Figure 2: Overview. Infinite Texture consists of three stages: (1) Generating a reference texture image from the text prompt; (2) Fine-tuning a diffusion model to learn the texture statistics of a particular reference texture image. We train the model using random crops of the reference texture image along with a unique identifier. The text encoder is also trained during this stage; and (3) We use the trained diffusion model to synthesize arbitrarily large textures: at every timestep of diffusion, we denoise small random patches and combine their estimates by taking the average noise estimate in overlapping regions. The resulting noise estimate is used to perform a DDIM sampling step.
Figure 3: Comparisons with baseline methods in texture synthesis. We use texture images generated by DALL-E 2 ramesh2022hierarchical as input exemplar textures. Infinite Texture stands out by generating the most consistent textures with variations. In contrast, image quilting efros2001image often results in repetitive patterns due to its tiling strategy. STTO kaspar2015self tends to converge to solutions where a smooth patch is repeated over and over. PSGAN bergmann2017learning struggles to capture the high-frequency signal effectively with its periodic signal generator. Non-stationary texture synthesis zhou2018non falls short in producing textures with variations due to only trained on small patches.
Figure 4: Ablation studies. Fixing the text encoder introduces color drift due to inherited semantic priors. Conversely, using fixed crops at test time maintains the same image quality but extends runtime from 6 to 50 minutes versus random crops.
Figure 5: Results of synthesized textures and renderings. We demonstrate high-resolution textures generated by Infinite Texture, along with photo-realistic renderings of an armchair utilizing the textures (including the wood armrest). Infinite Texture is capable of generating textures with different variations when provided with the same text prompt. These textures can be seamlessly incorporated into 3D rendering pipelines, resulting in nearly infinite material choices for assets in a 3D shape collection. We use the default UV mapping from the CAD model to warp the texture onto the 3D model.
...and 2 more figures

Infinite Texture: Text-guided High Resolution Diffusion Texture Synthesis

TL;DR

Abstract

Infinite Texture: Text-guided High Resolution Diffusion Texture Synthesis

Authors

TL;DR

Abstract

Table of Contents

Figures (7)