Infinite Texture: Text-guided High Resolution Diffusion Texture Synthesis
Yifan Wang, Aleksander Holynski, Brian L. Curless, Steven M. Seitz
TL;DR
Infinite Texture addresses the challenge of generating high-resolution, diverse textures guided by natural language. It introduces a three-stage pipeline: (1) generate a 1024×1024 reference texture from a text prompt using a diffusion-based T2I model, (2) fine-tune a diffusion model per texture to learn its statistics from patches with a unique identifier, and (3) synthesize arbitrarily large textures by patch-wise denoising with score aggregation on a single GPU. The approach yields textures that preserve reference statistics while permitting stochastic variations, demonstrated through 3D rendering and texture transfer, and shows competitive realism in human studies compared with traditional baselines. While effective and GPU-friendly, the method inherits diffusion-model limitations such as color drift and directional bias, suggesting future work in mitigating lighting artifacts and improving handling of anisotropic textures.
Abstract
We present Infinite Texture, a method for generating arbitrarily large texture images from a text prompt. Our approach fine-tunes a diffusion model on a single texture, and learns to embed that statistical distribution in the output domain of the model. We seed this fine-tuning process with a sample texture patch, which can be optionally generated from a text-to-image model like DALL-E 2. At generation time, our fine-tuned diffusion model is used through a score aggregation strategy to generate output texture images of arbitrary resolution on a single GPU. We compare synthesized textures from our method to existing work in patch-based and deep learning texture synthesis methods. We also showcase two applications of our generated textures in 3D rendering and texture transfer.
