Table of Contents
Fetching ...

TexControl: Sketch-Based Two-Stage Fashion Image Generation Using Diffusion Model

Yongming Zhang, Tianyu Zhang, Haoran Xie

TL;DR

The paper tackles the challenge of generating high-texture fashion images from sketches by introducing TexControl, a two-stage diffusion-based framework. The base stage uses a sketch-conditioned ControlNet to produce outline previews, while the texture stage employs an image-to-image ControlNet with a model-merge strategy to enforce textures and materials, yielding more realistic garments. Key contributions include (1) outlining a practical two-stage decomposition that separates geometry and texture control, (2) applying a model-merge technique to fuse scribble and latent diffusion representations, and (3) demonstrating qualitative improvements in texture detail and outline fidelity over single-stage baselines. The approach holds potential to assist fashion designers by providing controllable, texture-rich design sketches, with future work focusing on quantitative metrics and dataset improvements to reduce body-clothing entanglement.

Abstract

Deep learning-based sketch-to-clothing image generation provides the initial designs and inspiration in the fashion design processes. However, clothing generation from freehand drawing is challenging due to the sparse and ambiguous information from the drawn sketches. The current generation models may have difficulty generating detailed texture information. In this work, we propose TexControl, a sketch-based fashion generation framework that uses a two-stage pipeline to generate the fashion image corresponding to the sketch input. First, we adopt ControlNet to generate the fashion image from sketch and keep the image outline stable. Then, we use an image-to-image method to optimize the detailed textures of the generated images and obtain the final results. The evaluation results show that TexControl can generate fashion images with high-quality texture as fine-grained image generation.

TexControl: Sketch-Based Two-Stage Fashion Image Generation Using Diffusion Model

TL;DR

The paper tackles the challenge of generating high-texture fashion images from sketches by introducing TexControl, a two-stage diffusion-based framework. The base stage uses a sketch-conditioned ControlNet to produce outline previews, while the texture stage employs an image-to-image ControlNet with a model-merge strategy to enforce textures and materials, yielding more realistic garments. Key contributions include (1) outlining a practical two-stage decomposition that separates geometry and texture control, (2) applying a model-merge technique to fuse scribble and latent diffusion representations, and (3) demonstrating qualitative improvements in texture detail and outline fidelity over single-stage baselines. The approach holds potential to assist fashion designers by providing controllable, texture-rich design sketches, with future work focusing on quantitative metrics and dataset improvements to reduce body-clothing entanglement.

Abstract

Deep learning-based sketch-to-clothing image generation provides the initial designs and inspiration in the fashion design processes. However, clothing generation from freehand drawing is challenging due to the sparse and ambiguous information from the drawn sketches. The current generation models may have difficulty generating detailed texture information. In this work, we propose TexControl, a sketch-based fashion generation framework that uses a two-stage pipeline to generate the fashion image corresponding to the sketch input. First, we adopt ControlNet to generate the fashion image from sketch and keep the image outline stable. Then, we use an image-to-image method to optimize the detailed textures of the generated images and obtain the final results. The evaluation results show that TexControl can generate fashion images with high-quality texture as fine-grained image generation.
Paper Structure (12 sections, 6 equations, 8 figures)

This paper contains 12 sections, 6 equations, 8 figures.

Figures (8)

  • Figure 1: The proposed method, TexControl, adopts sketches as conditional input and generates fine-designed clothing images whose textures are consistent with the text inputs. The outline preview images are applied to divide TexControl into two stages: sketch-to-image stage and image-to-image stage.
  • Figure 2: The framework of TexControl. TexControl consists of two stages: The base generation stage uses the ControlNet Scribble to generate an outline preview, and the texture control stage uses the ControlNet ip2p with model merge to generate the fine-designed result. $Z_T$ is the latent representation in latent space while $T$ is the timesteps.
  • Figure 3: We collected diverse sketches through various sources and approaches.
  • Figure 4: The result compare with the TexControl(Ours) and ControlNet.
  • Figure 5: TexControl is good at generating fine-grained texture.
  • ...and 3 more figures