Table of Contents
Fetching ...

Diffusion Model-Based Size Variable Virtual Try-On Technology and Evaluation Method

Shufang Zhang, Hang Qian, Minxue Ni, Yaxuan Li, Wenxin Ding, Jun Liu

TL;DR

This work tackles the mismatch between consumer size preferences and virtual try-on outputs by introducing SV-VTON, a diffusion-model–based system that enables multi-size garment fitting through refined, multi-size masks and proportionally adjusted garments. It combines a two-stage mask generation pipeline (Coarse Mask Generation Stage and Refined Mask Generation Stage) with an Edge Attention–enhanced refinement and a wrinkle-aware Evaluation Module to quantify how closely generated sizes align with international standards. The key contributions are: (i) multi-size VTON capability covering $A_1$, $A_2$, and $A_3$ fits, (ii) a two-stage mask refinement strategy to decouple size from original garment masks, and (iii) a quantitative size-variation evaluation framework using four metrics and wrinkle compensation to ensure realism. Experimental results on VITON-HD show SV-VTON achieves high sizing accuracy (typical errors around 5% or less) and robust generalization across different diffusion backbones, demonstrating practical utility for personalized, multi-size virtual try-on in e-commerce.

Abstract

With the rapid development of e-commerce, virtual try-on technology has become an essential tool to satisfy consumers' personalized clothing preferences. Diffusion-based virtual try-on systems aim to naturally align garments with target individuals, generating realistic and detailed try-on images. However, existing methods overlook the importance of garment size variations in meeting personalized consumer needs. To address this, we propose a novel virtual try-on method named SV-VTON, which introduces garment sizing concepts into virtual try-on tasks. The SV-VTON method first generates refined masks for multiple garment sizes, then integrates these masks with garment images at varying proportions, enabling virtual try-on simulations across different sizes. In addition, we developed a specialized size evaluation module to quantitatively assess the accuracy of size variations. This module calculates differences between generated size increments and international sizing standards, providing objective measurements of size accuracy. To further validate SV-VTON's generalization capability across different models, we conducted experiments on multiple SOTA Diffusion models. The results demonstrate that SV-VTON consistently achieves precise multi-size virtual try-on across various SOTA models, and validates the effectiveness and rationality of the proposed method, significantly fulfilling users' personalized multi-size virtual try-on requirements.

Diffusion Model-Based Size Variable Virtual Try-On Technology and Evaluation Method

TL;DR

This work tackles the mismatch between consumer size preferences and virtual try-on outputs by introducing SV-VTON, a diffusion-model–based system that enables multi-size garment fitting through refined, multi-size masks and proportionally adjusted garments. It combines a two-stage mask generation pipeline (Coarse Mask Generation Stage and Refined Mask Generation Stage) with an Edge Attention–enhanced refinement and a wrinkle-aware Evaluation Module to quantify how closely generated sizes align with international standards. The key contributions are: (i) multi-size VTON capability covering , , and fits, (ii) a two-stage mask refinement strategy to decouple size from original garment masks, and (iii) a quantitative size-variation evaluation framework using four metrics and wrinkle compensation to ensure realism. Experimental results on VITON-HD show SV-VTON achieves high sizing accuracy (typical errors around 5% or less) and robust generalization across different diffusion backbones, demonstrating practical utility for personalized, multi-size virtual try-on in e-commerce.

Abstract

With the rapid development of e-commerce, virtual try-on technology has become an essential tool to satisfy consumers' personalized clothing preferences. Diffusion-based virtual try-on systems aim to naturally align garments with target individuals, generating realistic and detailed try-on images. However, existing methods overlook the importance of garment size variations in meeting personalized consumer needs. To address this, we propose a novel virtual try-on method named SV-VTON, which introduces garment sizing concepts into virtual try-on tasks. The SV-VTON method first generates refined masks for multiple garment sizes, then integrates these masks with garment images at varying proportions, enabling virtual try-on simulations across different sizes. In addition, we developed a specialized size evaluation module to quantitatively assess the accuracy of size variations. This module calculates differences between generated size increments and international sizing standards, providing objective measurements of size accuracy. To further validate SV-VTON's generalization capability across different models, we conducted experiments on multiple SOTA Diffusion models. The results demonstrate that SV-VTON consistently achieves precise multi-size virtual try-on across various SOTA models, and validates the effectiveness and rationality of the proposed method, significantly fulfilling users' personalized multi-size virtual try-on requirements.

Paper Structure

This paper contains 25 sections, 6 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Multi-size VTON results generated by our method. Our approach can generate images for the same person in different garment sizes.
  • Figure 2: Testing results using models trained with loose masks (right) and tight masks (left). The regions marked by red boxes indicate problematic areas caused by single-size mask generation.
  • Figure 3: Overview of the proposed framework of SV-VTON and EM. We use $I$ and $M_o$ to denote the human source image and original tight-fitting mask, respectively. The Multiple-Size Mask Generation Module includes two stages: the Coarse Mask Generation Stage and the Refined Mask Generation Stage, which respectively produce a coarse mask $M_C$ and refined masks ($M_1$, $M_2$, and $M_3$) corresponding to $A_1$, $A_2$, and $A_3$ sizes. In the Try-On Module, based on the Diffusion model, the multi-size masks and proportionally adjusted garment images ($C_1$, $C_2$, and $C_3$) are inputted to generate multiple results ($Y_1$, $Y_2$, and $Y_3$) for the same garment under different sizes. The EM quantitatively measures size increments using four defined metrics, considering garment wrinkles as compensation for different size proportions. It computes deviations between the size increments of generated images and international standard increments to comprehensively evaluate the validity and accuracy of garment sizing variations.
  • Figure 4: Defined measurement standards. The lengths of the four red lines represent the measured dimensions, indicating the defined measurement positions, which are consistent with international size definitions.
  • Figure 5: Comparison of generated results across three size styles ($A_1$-size, $A_2$-size, $A_3$-size) using three different Diffusion models: Reference Net, StableVITON, and DCI-VTON.
  • ...and 1 more figures