Table of Contents
Fetching ...

Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment

Mengting Chen, Xi Chen, Zhonghua Zhai, Chen Ju, Xuewen Hong, Jinsong Lan, Shuai Xiao

TL;DR

Wear-Any-Way provides a novel interaction form for customizing the wearing style and enables more liberated and flexible expressions of the attires, holding profound implications in the fashion industry.

Abstract

This paper introduces a novel framework for virtual try-on, termed Wear-Any-Way. Different from previous methods, Wear-Any-Way is a customizable solution. Besides generating high-fidelity results, our method supports users to precisely manipulate the wearing style. To achieve this goal, we first construct a strong pipeline for standard virtual try-on, supporting single/multiple garment try-on and model-to-model settings in complicated scenarios. To make it manipulable, we propose sparse correspondence alignment which involves point-based control to guide the generation for specific locations. With this design, Wear-Any-Way gets state-of-the-art performance for the standard setting and provides a novel interaction form for customizing the wearing style. For instance, it supports users to drag the sleeve to make it rolled up, drag the coat to make it open, and utilize clicks to control the style of tuck, etc. Wear-Any-Way enables more liberated and flexible expressions of the attires, holding profound implications in the fashion industry.

Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment

TL;DR

Wear-Any-Way provides a novel interaction form for customizing the wearing style and enables more liberated and flexible expressions of the attires, holding profound implications in the fashion industry.

Abstract

This paper introduces a novel framework for virtual try-on, termed Wear-Any-Way. Different from previous methods, Wear-Any-Way is a customizable solution. Besides generating high-fidelity results, our method supports users to precisely manipulate the wearing style. To achieve this goal, we first construct a strong pipeline for standard virtual try-on, supporting single/multiple garment try-on and model-to-model settings in complicated scenarios. To make it manipulable, we propose sparse correspondence alignment which involves point-based control to guide the generation for specific locations. With this design, Wear-Any-Way gets state-of-the-art performance for the standard setting and provides a novel interaction form for customizing the wearing style. For instance, it supports users to drag the sleeve to make it rolled up, drag the coat to make it open, and utilize clicks to control the style of tuck, etc. Wear-Any-Way enables more liberated and flexible expressions of the attires, holding profound implications in the fashion industry.
Paper Structure (16 sections, 4 equations, 8 figures, 4 tables)

This paper contains 16 sections, 4 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Manipulable try-on with Wear-Any-Way. Our method achieves state-of-the-art performance for the standard setting of virtual try-on (first row), supporting diversified input formats and scenarios. An impressive feature is that our method supports users to manipulate the way of wearing using simple interactions like click (second row) and drag (third row). It should be noted that all these applications are accomplished with a single model in one pass.
  • Figure 2: The pipeline of Wear-Any-Way. The overall framework consists of two U-Nets. The reference U-Net takes the garment image as input to extract fine-grained features. The main U-Net takes charge of generating the try-on results. It takes the person image (masked), the garment mask, and the latent noise as input. We exert the pose control via an additional pose encoder. The point-based control is realized by a point embedding network and a sparse correspondence alignment module. The detailed structures are demonstrated on the right part. Symbols of flames and snowflakes denote trainable and frozen parameters respectively.
  • Figure 3: Pipeline of collecting the training point-pairs. As shown on the left, the person and garment images are sent into the same Stable Diffusion to extract the feature. We calculate the cosine similarity between the two feature maps to get the point pairs. Some densely sampled point pairs are demonstrated on the right.
  • Figure 4: The inference pipeline of Wear-Any-Way. For click-based control, users provide garment images, person images, and point pairs to customize the generation. When the user drags the image, the starting and end points are translated as the garment and person points. While the parsed clothes are regarded as the garment image. Thus, the drag could be transformed into the click-based setting.
  • Figure 5: Ablation studies on different feature extractors. We compare CLIP image encoder CLIP, DINOv2 dinov2, ControlNet contolnet, and our reference U-Net. Our results demonstrate notable superiority.
  • ...and 3 more figures