TexControl: Sketch-Based Two-Stage Fashion Image Generation Using Diffusion Model
Yongming Zhang, Tianyu Zhang, Haoran Xie
TL;DR
The paper tackles the challenge of generating high-texture fashion images from sketches by introducing TexControl, a two-stage diffusion-based framework. The base stage uses a sketch-conditioned ControlNet to produce outline previews, while the texture stage employs an image-to-image ControlNet with a model-merge strategy to enforce textures and materials, yielding more realistic garments. Key contributions include (1) outlining a practical two-stage decomposition that separates geometry and texture control, (2) applying a model-merge technique to fuse scribble and latent diffusion representations, and (3) demonstrating qualitative improvements in texture detail and outline fidelity over single-stage baselines. The approach holds potential to assist fashion designers by providing controllable, texture-rich design sketches, with future work focusing on quantitative metrics and dataset improvements to reduce body-clothing entanglement.
Abstract
Deep learning-based sketch-to-clothing image generation provides the initial designs and inspiration in the fashion design processes. However, clothing generation from freehand drawing is challenging due to the sparse and ambiguous information from the drawn sketches. The current generation models may have difficulty generating detailed texture information. In this work, we propose TexControl, a sketch-based fashion generation framework that uses a two-stage pipeline to generate the fashion image corresponding to the sketch input. First, we adopt ControlNet to generate the fashion image from sketch and keep the image outline stable. Then, we use an image-to-image method to optimize the detailed textures of the generated images and obtain the final results. The evaluation results show that TexControl can generate fashion images with high-quality texture as fine-grained image generation.
