TextureDiffusion: Target Prompt Disentangled Editing for Various Texture Transfer
Zihan Su, Junhao Zhuang, Chun Yuan
TL;DR
TextureDiffusion addresses the limitation of transferring complex textures by disentangling texture from content via setting the target prompt to '<texture>'. It introduces a structure-preservation module that injects query features into self-attention and residual blocks, and an edit localization technique that leverages cross-attention maps to confine edits to the target region. The method operates in the Stable Diffusion latent space in a tuning-free fashion, achieving harmonious texture transfer with preserved structure and background. Experiments on PIE-Bench show superior performance against multiple baselines, and the authors provide public code for reproducibility at https://github.com/THU-CVML/TextureDiffusion.
Abstract
Recently, text-guided image editing has achieved significant success. However, existing methods can only apply simple textures like wood or gold when changing the texture of an object. Complex textures such as cloud or fire pose a challenge. This limitation stems from that the target prompt needs to contain both the input image content and <texture>, restricting the texture representation. In this paper, we propose TextureDiffusion, a tuning-free image editing method applied to various texture transfer. Initially, the target prompt is directly set to "<texture>", making the texture disentangled from the input image content to enhance texture representation. Subsequently, query features in self-attention and features in residual blocks are utilized to preserve the structure of the input image. Finally, to maintain the background, we introduce an edit localization technique which blends the self-attention results and the intermediate latents. Comprehensive experiments demonstrate that TextureDiffusion can harmoniously transfer various textures with excellent structure and background preservation. Code is publicly available at https://github.com/THU-CVML/TextureDiffusion
