Table of Contents
Fetching ...

Significance of Anatomical Constraints in Virtual Try-On

Debapriya Roy, Sanchayan Santra, Diganta Mukherjee, Bhabatosh Chanda

TL;DR

This work tackles Virtual Try-On under challenging poses where arm bending and inter-part overlaps degrade realism with traditional TPS- or flow-based warps. It introduces an Anatomy-Aware Geometric (ATAG) transform that models the upper garment as independently warpable torso and sleeves, enabling pose-consistent deformations, and a part-based warping pipeline combined with Mask Prediction Network (MPN) and Image Synthesizer Network (ISN) to handle occlusions and synthesis. Through extensive experiments on MPV and VITON-HD, the method demonstrates competitive quantitative performance and clearer visual quality in complex poses, with ablations confirming the benefit of the MPN and parsing branch. The approach advances pose-robust VTON by aligning garment warping with human anatomy, and it offers a scalable framework that can extend to more garment parts and higher-resolution synthesis.

Abstract

The system of Virtual Try-ON (VTON) allows a user to try a product virtually. In general, a VTON system takes a clothing source and a person's image to predict the try-on output of the person in the given clothing. Although existing methods perform well for simple poses, in case of bent or crossed arms posture or when there is a significant difference between the alignment of the source clothing and the pose of the target person, these methods fail by generating inaccurate clothing deformations. In the VTON methods that employ Thin Plate Spline (TPS) based clothing transformations, this mainly occurs for two reasons - (1)~the second-order smoothness constraint of TPS that restricts the bending of the object plane. (2)~Overlaps among different clothing parts (e.g., sleeves and torso) can not be modeled by a single TPS transformation, as it assumes the clothing as a single planar object; therefore, disregards the independence of movement of different clothing parts. To this end, we make two major contributions. Concerning the bending limitations of TPS, we propose a human AnaTomy-Aware Geometric (ATAG) transformation. Regarding the overlap issue, we propose a part-based warping approach that divides the clothing into independently warpable parts to warp them separately and later combine them. Extensive analysis shows the efficacy of this approach.

Significance of Anatomical Constraints in Virtual Try-On

TL;DR

This work tackles Virtual Try-On under challenging poses where arm bending and inter-part overlaps degrade realism with traditional TPS- or flow-based warps. It introduces an Anatomy-Aware Geometric (ATAG) transform that models the upper garment as independently warpable torso and sleeves, enabling pose-consistent deformations, and a part-based warping pipeline combined with Mask Prediction Network (MPN) and Image Synthesizer Network (ISN) to handle occlusions and synthesis. Through extensive experiments on MPV and VITON-HD, the method demonstrates competitive quantitative performance and clearer visual quality in complex poses, with ablations confirming the benefit of the MPN and parsing branch. The approach advances pose-robust VTON by aligning garment warping with human anatomy, and it offers a scalable framework that can extend to more garment parts and higher-resolution synthesis.

Abstract

The system of Virtual Try-ON (VTON) allows a user to try a product virtually. In general, a VTON system takes a clothing source and a person's image to predict the try-on output of the person in the given clothing. Although existing methods perform well for simple poses, in case of bent or crossed arms posture or when there is a significant difference between the alignment of the source clothing and the pose of the target person, these methods fail by generating inaccurate clothing deformations. In the VTON methods that employ Thin Plate Spline (TPS) based clothing transformations, this mainly occurs for two reasons - (1)~the second-order smoothness constraint of TPS that restricts the bending of the object plane. (2)~Overlaps among different clothing parts (e.g., sleeves and torso) can not be modeled by a single TPS transformation, as it assumes the clothing as a single planar object; therefore, disregards the independence of movement of different clothing parts. To this end, we make two major contributions. Concerning the bending limitations of TPS, we propose a human AnaTomy-Aware Geometric (ATAG) transformation. Regarding the overlap issue, we propose a part-based warping approach that divides the clothing into independently warpable parts to warp them separately and later combine them. Extensive analysis shows the efficacy of this approach.
Paper Structure (16 sections, 7 equations, 13 figures, 4 tables, 1 algorithm)

This paper contains 16 sections, 7 equations, 13 figures, 4 tables, 1 algorithm.

Figures (13)

  • Figure 1: A comparative study with some benchmark methods, ACGPN (Warping-based, C2P) and He et al. (Flow-Based, C2P), illustrating their lack of ability in warping the sleeves in case of significant arm bending (first row), and, inappropriate transformed clothing in case of overlaps among different clothing parts i.e., sleeves overlap on the torso (second row).
  • Figure 2: Demonstration of simple and complex human poses.
  • Figure 3: (a) (top) Elbow flexion and extension (picture courtesy elbow_flexion), (bottom) Stretch and folds in clothing sleeve due to elbow flexion i.e., arm bending (picture courtesy sleeve_folding). (b) A graphical illustration of the arm-bending phenomenon in humans. A sample pair of model and person relevant to the illustrated scenario is given above for better understanding, (c) Plot of functions $f(\phi_1, \phi_2)$, $g(\phi)$ and $h(\phi_1, \phi_2)$, (d) Geometrical illustration of our warping method for sleeves warping for the two different scenarios. (Left) Example of a case when assumption 1 holds. (Right) Example of a case when assumption 2 holds. Here $\{A, B, C\}$ and $\{A', B', C'\}$ are the landmarks corresponding to the arm of the person and the model respectively. $X$ refers to the point belonging to the sleeve segment of the target warp and $X'$ is its corresponding source pixel.
  • Figure 4: Our result depicts the effect of scaling of lines in our method. As the line length increases from source to target the corresponding part of the rectangle looks zoomed-in in the result.
  • Figure 5: Illustration of our overall virtual try-on approach.
  • ...and 8 more figures