Table of Contents
Fetching ...

Try-On-Adapter: A Simple and Flexible Try-On Paradigm

Hanzhong Guo, Jianfeng Zhang, Cheng Zou, Jun Li, Meng Wang, Ruxue Wen, Pingzhong Tang, Jingdong Chen, Ming Yang

TL;DR

This paper proposes Try-On-Adapter (TOA), an outpainting paradigm that differs from the existing inpainting paradigm, which can preserve the given face and garment, naturally imagine the rest parts of the image, and provide flexible control ability with various conditions.

Abstract

Image-based virtual try-on, widely used in online shopping, aims to generate images of a naturally dressed person conditioned on certain garments, providing significant research and commercial potential. A key challenge of try-on is to generate realistic images of the model wearing the garments while preserving the details of the garments. Previous methods focus on masking certain parts of the original model's standing image, and then inpainting on masked areas to generate realistic images of the model wearing corresponding reference garments, which treat the try-on task as an inpainting task. However, such implements require the user to provide a complete, high-quality standing image, which is user-unfriendly in practical applications. In this paper, we propose Try-On-Adapter (TOA), an outpainting paradigm that differs from the existing inpainting paradigm. Our TOA can preserve the given face and garment, naturally imagine the rest parts of the image, and provide flexible control ability with various conditions, e.g., garment properties and human pose. In the experiments, TOA shows excellent performance on the virtual try-on task even given relatively low-quality face and garment images in qualitative comparisons. Additionally, TOA achieves the state-of-the-art performance of FID scores 5.56 and 7.23 for paired and unpaired on the VITON-HD dataset in quantitative comparisons.

Try-On-Adapter: A Simple and Flexible Try-On Paradigm

TL;DR

This paper proposes Try-On-Adapter (TOA), an outpainting paradigm that differs from the existing inpainting paradigm, which can preserve the given face and garment, naturally imagine the rest parts of the image, and provide flexible control ability with various conditions.

Abstract

Image-based virtual try-on, widely used in online shopping, aims to generate images of a naturally dressed person conditioned on certain garments, providing significant research and commercial potential. A key challenge of try-on is to generate realistic images of the model wearing the garments while preserving the details of the garments. Previous methods focus on masking certain parts of the original model's standing image, and then inpainting on masked areas to generate realistic images of the model wearing corresponding reference garments, which treat the try-on task as an inpainting task. However, such implements require the user to provide a complete, high-quality standing image, which is user-unfriendly in practical applications. In this paper, we propose Try-On-Adapter (TOA), an outpainting paradigm that differs from the existing inpainting paradigm. Our TOA can preserve the given face and garment, naturally imagine the rest parts of the image, and provide flexible control ability with various conditions, e.g., garment properties and human pose. In the experiments, TOA shows excellent performance on the virtual try-on task even given relatively low-quality face and garment images in qualitative comparisons. Additionally, TOA achieves the state-of-the-art performance of FID scores 5.56 and 7.23 for paired and unpaired on the VITON-HD dataset in quantitative comparisons.

Paper Structure

This paper contains 16 sections, 7 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Results of Try-On-Adapter. The first column gives reference face (top) and reference pose (bottom). The second column shows two target garments. The third to fifth columns show try-on results generated by TOA with different conditions. The third column is generated by the reference face and target garments, conditioned on null text guidance, without pose control. The fourth column is generated by the reference face and target garments, conditioned on text guidance, without pose control. The last column is generated by the reference face and target garment, conditioned on both text guidance and pose control.
  • Figure 2: Overall architecture of Try-On-Adapter. The Blocks in red represent the trainable blocks while the blocks in blue denote the frozen pretrained blocks. The blocks in orange represent the original available public try-on dataset and the blocks in yellow represent the obtained dataset. The proposed Try-On-Adapter primarily comprises two modules: the fusion of different references on the Cross-Attention Block and the Reference U-Net specifically designed for garments.
  • Figure 3: Qualitative Results of the Try-On-Adapter. (a) is the single dataset evaluation where the faces and the garments are from the VITON-HD and (b) denotes the cross dataset evaluation where the faces are from the VITON-HD and the garments are from the Internet. (a)-(2) denotes the generation of TOA while given the pose in (a)-(1) and the face in (a)-(4). (I)-(a)-(3) denotes the generation under text guidance "pink dress" and (II)-(a)-(3) under "pink tops". The generations in (b)-(5), (b)-(7) are conditioned on the garments from (b)-(6), (b)-(8), respectively.