Table of Contents
Fetching ...

Progressive Limb-Aware Virtual Try-On

Xiaoyu Han, Shengping Zhang, Qinglin Liu, Zonglin Li, Chenyang Wang

TL;DR

PL-VTON addresses the fidelity and limb-texture challenges of image-based virtual try-on by proposing a progressive, limb-aware framework. It introduces three integrated components: MCW for two-stage, multi-attribute clothing warping; HPE to provide a semantically informed parsing map that constrains garment placement and limb textures; and LTF to fuse textures in a coarse-to-fine manner with explicit limb guidance. The method achieves state-of-the-art performance on VITON, improving both qualitative realism and quantitative metrics such as FID, and demonstrates robustness across cross-category clothing changes like long- to short-sleeve transformations. This work advances practical virtual try-on by preserving limb details and enabling more accurate garment geometry, with clear implications for e-commerce and fashion editing.

Abstract

Existing image-based virtual try-on methods directly transfer specific clothing to a human image without utilizing clothing attributes to refine the transferred clothing geometry and textures, which causes incomplete and blurred clothing appearances. In addition, these methods usually mask the limb textures of the input for the clothing-agnostic person representation, which results in inaccurate predictions for human limb regions (i.e., the exposed arm skin), especially when transforming between long-sleeved and short-sleeved garments. To address these problems, we present a progressive virtual try-on framework, named PL-VTON, which performs pixel-level clothing warping based on multiple attributes of clothing and embeds explicit limb-aware features to generate photo-realistic try-on results. Specifically, we design a Multi-attribute Clothing Warping (MCW) module that adopts a two-stage alignment strategy based on multiple attributes to progressively estimate pixel-level clothing displacements. A Human Parsing Estimator (HPE) is then introduced to semantically divide the person into various regions, which provides structural constraints on the human body and therefore alleviates texture bleeding between clothing and limb regions. Finally, we propose a Limb-aware Texture Fusion (LTF) module to estimate high-quality details in limb regions by fusing textures of the clothing and the human body with the guidance of explicit limb-aware features. Extensive experiments demonstrate that our proposed method outperforms the state-of-the-art virtual try-on methods both qualitatively and quantitatively. The code is available at https://github.com/xyhanHIT/PL-VTON.

Progressive Limb-Aware Virtual Try-On

TL;DR

PL-VTON addresses the fidelity and limb-texture challenges of image-based virtual try-on by proposing a progressive, limb-aware framework. It introduces three integrated components: MCW for two-stage, multi-attribute clothing warping; HPE to provide a semantically informed parsing map that constrains garment placement and limb textures; and LTF to fuse textures in a coarse-to-fine manner with explicit limb guidance. The method achieves state-of-the-art performance on VITON, improving both qualitative realism and quantitative metrics such as FID, and demonstrates robustness across cross-category clothing changes like long- to short-sleeve transformations. This work advances practical virtual try-on by preserving limb details and enabling more accurate garment geometry, with clear implications for e-commerce and fashion editing.

Abstract

Existing image-based virtual try-on methods directly transfer specific clothing to a human image without utilizing clothing attributes to refine the transferred clothing geometry and textures, which causes incomplete and blurred clothing appearances. In addition, these methods usually mask the limb textures of the input for the clothing-agnostic person representation, which results in inaccurate predictions for human limb regions (i.e., the exposed arm skin), especially when transforming between long-sleeved and short-sleeved garments. To address these problems, we present a progressive virtual try-on framework, named PL-VTON, which performs pixel-level clothing warping based on multiple attributes of clothing and embeds explicit limb-aware features to generate photo-realistic try-on results. Specifically, we design a Multi-attribute Clothing Warping (MCW) module that adopts a two-stage alignment strategy based on multiple attributes to progressively estimate pixel-level clothing displacements. A Human Parsing Estimator (HPE) is then introduced to semantically divide the person into various regions, which provides structural constraints on the human body and therefore alleviates texture bleeding between clothing and limb regions. Finally, we propose a Limb-aware Texture Fusion (LTF) module to estimate high-quality details in limb regions by fusing textures of the clothing and the human body with the guidance of explicit limb-aware features. Extensive experiments demonstrate that our proposed method outperforms the state-of-the-art virtual try-on methods both qualitatively and quantitatively. The code is available at https://github.com/xyhanHIT/PL-VTON.

Paper Structure

This paper contains 24 sections, 9 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: The overview of the proposed PL-VTON. (a) MCW adopts a two-stage alignment strategy to estimate the aggregated flow $f_a$. (b) HPE estimates the target parsing map $P^t$ to provide structural constraints. (c) LTF first produces the coarse try-on result $I_c$ and then utilizes the information of the limb map to refine $I_c$ and get the fine try-on result $I_f$.
  • Figure 2: The human shape map contains geometric information of the clothing (e.g., the original collar shape), even if it has been down-sampled and interpolated.
  • Figure 3: The effect of the amount of limb information used to produce the try-on result.
  • Figure 4: Visual comparisons of five different methods. PL-VTON works well for the transformation between long and short sleeves (the first row), fancy clothing try-on (the second row), cognition of the collar and hem (the third row), clothing texture transfer (the fourth row), and limb detail retention (the last row).
  • Figure 5: The ablation study of the proposed two-stage alignment strategy in Multi-attribute Clothing Warping (MCW), where red boxes focus on the clothing shape and blue boxes focus on the clothing textures. PL-VTON$^\ast$ is PL-VTON without the two-stage alignment strategy.
  • ...and 1 more figures