Table of Contents
Fetching ...

Semi-supervised Latent Disentangled Diffusion Model for Textile Pattern Generation

Chenggong Hu, Yi Wang, Mengqi Xue, Haofei Zhang, Jie Song, Li Sun

Abstract

Textile pattern generation (TPG) aims to synthesize fine-grained textile pattern images based on given clothing images. Although previous studies have not explicitly investigated TPG, existing image-to-image models appear to be natural candidates for this task. However, when applied directly, these methods often produce unfaithful results, failing to preserve fine-grained details due to feature confusion between complex textile patterns and the inherent non-rigid texture distortions in clothing images. In this paper, we propose a novel method, SLDDM-TPG, for faithful and high-fidelity TPG. Our method consists of two stages: (1) a latent disentangled network (LDN) that resolves feature confusion in clothing representations and constructs a multi-dimensional, independent clothing feature space; and (2) a semi-supervised latent diffusion model (S-LDM), which receives guidance signals from LDN and generates faithful results through semi-supervised diffusion training, combined with our designed fine-grained alignment strategy. Extensive evaluations show that SLDDM-TPG reduces FID by 4.1 and improves SSIM by up to 0.116 on our CTP-HD dataset, and also demonstrate good generalization on the VITON-HD dataset.

Semi-supervised Latent Disentangled Diffusion Model for Textile Pattern Generation

Abstract

Textile pattern generation (TPG) aims to synthesize fine-grained textile pattern images based on given clothing images. Although previous studies have not explicitly investigated TPG, existing image-to-image models appear to be natural candidates for this task. However, when applied directly, these methods often produce unfaithful results, failing to preserve fine-grained details due to feature confusion between complex textile patterns and the inherent non-rigid texture distortions in clothing images. In this paper, we propose a novel method, SLDDM-TPG, for faithful and high-fidelity TPG. Our method consists of two stages: (1) a latent disentangled network (LDN) that resolves feature confusion in clothing representations and constructs a multi-dimensional, independent clothing feature space; and (2) a semi-supervised latent diffusion model (S-LDM), which receives guidance signals from LDN and generates faithful results through semi-supervised diffusion training, combined with our designed fine-grained alignment strategy. Extensive evaluations show that SLDDM-TPG reduces FID by 4.1 and improves SSIM by up to 0.116 on our CTP-HD dataset, and also demonstrate good generalization on the VITON-HD dataset.
Paper Structure (12 sections, 11 equations, 7 figures, 3 tables)

This paper contains 12 sections, 11 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Textile pattern image results generated by our SLDDM-TPG from real worn clothing on our CTP-HD dataset. The first row show the clothing images and the second row show the generated textile pattern images.
  • Figure 2: The framework of SLDDM-TPG. During training, LDN is first trained to disentangle features of clothing images ${C}$ to textile pattern content feature${f_S^c}$, predicted structured feature$f_A^c$, and texture defect feature$f_T^c$. Then S-LDM is trained, guided by LDN's features to generate pattern images ${P}$ aligned with image ${C}$. There are data with and without ground truth in a batch. The two S-LDM share the same network and parameters, but their inputs, calculated losses, and comparison objects of losses are different, as can be seen in red parts in stage-II. During inference, a clothing image ${C}$ is fed into LDN, whose output features are passed to S-LDM as conditions, and $T$-step denoising is then applied to the noise input to generate the final result.
  • Figure 3: Qualitative comparisons of different methods on TPG on our CTP-HD dataset with and without GT.
  • Figure 4: Generalization performance comparisons of different methods on the VITON-HD dataset without GT.
  • Figure 5: Qualitative ablation study on LDN using each individual feature of LDN and the original undisentangled feature.
  • ...and 2 more figures