Table of Contents
Fetching ...

VITON-DRR: Details Retention Virtual Try-on via Non-rigid Registration

Ben Li, Minqi Li, Jie Ren, Kaibing Zhang

TL;DR

VITON-DRR addresses the challenge of preserving fine garment details in 2D image-based virtual try-on under large non-rigid deformations. It introduces a three-module pipeline—HSGM for semantically guided body maps, DM for accurate non-rigid registration using edge-keypoint matching and MLS warping, and ISM for CGAN-based image synthesis—achieving smoother deformations and richer texture preservation. This approach demonstrates superior garment detail retention on the Zalando dataset, outperforming state-of-the-art methods in perceptual fidelity and demonstrating competitive efficiency. The work advances practical VR/commerce applications by enabling more realistic and reliable virtual try-ons across diverse poses, with potential extensions using cross-modal cues to further improve generalization.

Abstract

Image-based virtual try-on aims to fit a target garment to a specific person image and has attracted extensive research attention because of its huge application potential in the e-commerce and fashion industries. To generate high-quality try-on results, accurately warping the clothing item to fit the human body plays a significant role, as slight misalignment may lead to unrealistic artifacts in the fitting image. Most existing methods warp the clothing by feature matching and thin-plate spline (TPS). However, it often fails to preserve clothing details due to self-occlusion, severe misalignment between poses, etc. To address these challenges, this paper proposes a detail retention virtual try-on method via accurate non-rigid registration (VITON-DRR) for diverse human poses. Specifically, we reconstruct a human semantic segmentation using a dual-pyramid-structured feature extractor. Then, a novel Deformation Module is designed for extracting the cloth key points and warping them through an accurate non-rigid registration algorithm. Finally, the Image Synthesis Module is designed to synthesize the deformed garment image and generate the human pose information adaptively. {Compared with} traditional methods, the proposed VITON-DRR can make the deformation of fitting images more accurate and retain more garment details. The experimental results demonstrate that the proposed method performs better than state-of-the-art methods.

VITON-DRR: Details Retention Virtual Try-on via Non-rigid Registration

TL;DR

VITON-DRR addresses the challenge of preserving fine garment details in 2D image-based virtual try-on under large non-rigid deformations. It introduces a three-module pipeline—HSGM for semantically guided body maps, DM for accurate non-rigid registration using edge-keypoint matching and MLS warping, and ISM for CGAN-based image synthesis—achieving smoother deformations and richer texture preservation. This approach demonstrates superior garment detail retention on the Zalando dataset, outperforming state-of-the-art methods in perceptual fidelity and demonstrating competitive efficiency. The work advances practical VR/commerce applications by enabling more realistic and reliable virtual try-ons across diverse poses, with potential extensions using cross-modal cues to further improve generalization.

Abstract

Image-based virtual try-on aims to fit a target garment to a specific person image and has attracted extensive research attention because of its huge application potential in the e-commerce and fashion industries. To generate high-quality try-on results, accurately warping the clothing item to fit the human body plays a significant role, as slight misalignment may lead to unrealistic artifacts in the fitting image. Most existing methods warp the clothing by feature matching and thin-plate spline (TPS). However, it often fails to preserve clothing details due to self-occlusion, severe misalignment between poses, etc. To address these challenges, this paper proposes a detail retention virtual try-on method via accurate non-rigid registration (VITON-DRR) for diverse human poses. Specifically, we reconstruct a human semantic segmentation using a dual-pyramid-structured feature extractor. Then, a novel Deformation Module is designed for extracting the cloth key points and warping them through an accurate non-rigid registration algorithm. Finally, the Image Synthesis Module is designed to synthesize the deformed garment image and generate the human pose information adaptively. {Compared with} traditional methods, the proposed VITON-DRR can make the deformation of fitting images more accurate and retain more garment details. The experimental results demonstrate that the proposed method performs better than state-of-the-art methods.

Paper Structure

This paper contains 19 sections, 5 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Example of missing clothing details in virtual try-on results. The fitting results in the above figure lost the pattern details of the clothes due to unreasonable deformation and fusion of the clothes.
  • Figure 2: Examples of virtual try-on results synthesized by our method. Given a reference human body image and a product clothing image, our method can synthesize a high-quality virtual try-on result while preserving garment details.
  • Figure 3: An overview of our VITON-DRR, containing three main modules. 1. (a) Human Semantics Generation Module predicts the human semantic segmentation $S_r$ given garment segmentation inputs $(C_m, S_p, D_p)$, where $S_p$ is the clothing-agnostic semantic segmentation from the human semantic segmentation $S$, $D_p$ is the pose mapping, and $C_m$ is the clothing mask from the garment $C$. (b) Deformation Module warps $C$ through the feature point registration function $f$ with the target segmentation template $G_C$ to obtain the deformed garment $W_C$, where $G_C$ is the garment region obtained from $S_r$. (c) Image Synthesis Module synthesizes the final virtual try-on result $G_I$ using $W$, $S_r$, and $I_a$, where $I_a$ is the clothing-agnostic human image.
  • Figure 4: Example of the clothes image registration. Given the clothes mask $C_M$ and the target area mask $G_C$, the edge point clouds $X$ and $Y$ are extracted, respectively. We register $X$ and $Y$ using a non-rigid point cloud registration method. The deformation parameters of $C$ are determined by the spatial motion of the point cloud $X$ during the registration process.
  • Figure 5: Schematic diagram of the generator and discriminator of Image Synthesis Module.
  • ...and 7 more figures