Table of Contents
Fetching ...

Product-Level Try-on: Characteristics-preserving Try-on with Realistic Clothes Shading and Wrinkles

Yanlong Zang, Han Yang, Jiaxu Miao, Yi Yang

TL;DR

The paper tackles product-level virtual try-on by preserving static garment details (textures, logos, embroideries) while generating realistic dynamic shading and folds that adapt to pose and environment. It introduces PLTON, a diffusion-prior-based pipeline that decouples static characteristics transformation from adaptive dynamic rendering using a High-Frequency Map (HF-Map), a Dynamic Extractor, and a Static Extractor. A two-stage blended denoising strategy enables high-resolution outputs from a limited dataset, improving stability and reducing artifacts. Experimental results on 1024×768 images show state-of-the-art realism, detail retention, and robustness to parsing errors and pose variations, with strong qualitative and quantitative performance advantages over baselines.

Abstract

Image-based virtual try-on systems,which fit new garments onto human portraits,are gaining research attention.An ideal pipeline should preserve the static features of clothes(like textures and logos)while also generating dynamic elements(e.g.shadows,folds)that adapt to the model's pose and environment.Previous works fail specifically in generating dynamic features,as they preserve the warped in-shop clothes trivially with predicted an alpha mask by composition.To break the dilemma of over-preserving and textures losses,we propose a novel diffusion-based Product-level virtual try-on pipeline,\ie PLTON, which can preserve the fine details of logos and embroideries while producing realistic clothes shading and wrinkles.The main insights are in three folds:1)Adaptive Dynamic Rendering:We take a pre-trained diffusion model as a generative prior and tame it with image features,training a dynamic extractor from scratch to generate dynamic tokens that preserve high-fidelity semantic information. Due to the strong generative power of the diffusion prior,we can generate realistic clothes shadows and wrinkles.2)Static Characteristics Transformation: High-frequency Map(HF-Map)is our fundamental insight for static representation.PLTON first warps in-shop clothes to the target model pose by a traditional warping network,and uses a high-pass filter to extract an HF-Map for preserving static cloth features.The HF-Map is used to generate modulation maps through our static extractor,which are injected into a fixed U-net to synthesize the final result.To enhance retention,a Two-stage Blended Denoising method is proposed to guide the diffusion process for correct spatial layout and color.PLTON is finetuned only with our collected small-size try-on dataset.Extensive quantitative and qualitative experiments on 1024 768 datasets demonstrate the superiority of our framework in mimicking real clothes dynamics.

Product-Level Try-on: Characteristics-preserving Try-on with Realistic Clothes Shading and Wrinkles

TL;DR

The paper tackles product-level virtual try-on by preserving static garment details (textures, logos, embroideries) while generating realistic dynamic shading and folds that adapt to pose and environment. It introduces PLTON, a diffusion-prior-based pipeline that decouples static characteristics transformation from adaptive dynamic rendering using a High-Frequency Map (HF-Map), a Dynamic Extractor, and a Static Extractor. A two-stage blended denoising strategy enables high-resolution outputs from a limited dataset, improving stability and reducing artifacts. Experimental results on 1024×768 images show state-of-the-art realism, detail retention, and robustness to parsing errors and pose variations, with strong qualitative and quantitative performance advantages over baselines.

Abstract

Image-based virtual try-on systems,which fit new garments onto human portraits,are gaining research attention.An ideal pipeline should preserve the static features of clothes(like textures and logos)while also generating dynamic elements(e.g.shadows,folds)that adapt to the model's pose and environment.Previous works fail specifically in generating dynamic features,as they preserve the warped in-shop clothes trivially with predicted an alpha mask by composition.To break the dilemma of over-preserving and textures losses,we propose a novel diffusion-based Product-level virtual try-on pipeline,\ie PLTON, which can preserve the fine details of logos and embroideries while producing realistic clothes shading and wrinkles.The main insights are in three folds:1)Adaptive Dynamic Rendering:We take a pre-trained diffusion model as a generative prior and tame it with image features,training a dynamic extractor from scratch to generate dynamic tokens that preserve high-fidelity semantic information. Due to the strong generative power of the diffusion prior,we can generate realistic clothes shadows and wrinkles.2)Static Characteristics Transformation: High-frequency Map(HF-Map)is our fundamental insight for static representation.PLTON first warps in-shop clothes to the target model pose by a traditional warping network,and uses a high-pass filter to extract an HF-Map for preserving static cloth features.The HF-Map is used to generate modulation maps through our static extractor,which are injected into a fixed U-net to synthesize the final result.To enhance retention,a Two-stage Blended Denoising method is proposed to guide the diffusion process for correct spatial layout and color.PLTON is finetuned only with our collected small-size try-on dataset.Extensive quantitative and qualitative experiments on 1024 768 datasets demonstrate the superiority of our framework in mimicking real clothes dynamics.
Paper Structure (16 sections, 4 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 16 sections, 4 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: Visual comparison of PLTON and other four traditional virtual try-on algorithms in generating clothes shadows and folds. To eliminate the influence of the original clothes' dynamic features, we fill the clothes with three solid colors: red, green, and blue (from top to bottom).
  • Figure 2: A schematic of PLTON. We utilize the warping module to deform the in-shop clothes using pose openpose and other conditions (e.g. densepose densepose segmentation graphonomyatr). Firstly, we apply High Pass Filters to the warped cloth to extract high-frequency features of the clothes. Then, we employ a Static Extractor to extract modulated prior maps from the HF-Map. Subsequently, the Dynamic Extractor is utilized to extract the dynamic features of the in-shop cloth, generating dynamic tokens. Finally, the dynamic tokens and modulated prior maps are input into a fixed pre-trained diffusion model, which produces the final output. The terms "Locked" and "Lockless" represent frozen and learnable parameters, respectively.
  • Figure 3: The visual comparison of different models (FS-VTON FS-VTON, HR-VTON HR-VTON, RT-VTON RT-VTON and ours) in the dynamic generation and static characteristics preservation of clothes.
  • Figure 4: Visual comparison of traditional virtual try-on methods and ours. The case when the parsing of the reference person goes wrong is chosen to demonstrate the robustness of our method.
  • Figure 5: Visual comparison results between ours and traditional virtual try-on methods on slightly difficult (raised hands, crossed hands) model images.
  • ...and 3 more figures