GlamTry: Advancing Virtual Try-On for High-End Accessories

Ting-Yu Chang; Seretsi Khabane Lekena

GlamTry: Advancing Virtual Try-On for High-End Accessories

Ting-Yu Chang, Seretsi Khabane Lekena

TL;DR

Results demonstrate improved location prediction compared to the original model for clothes, even with a small dataset, and underscores the model's potential with larger datasets exceeding 10,000 images, paving the way for future research in virtual accessory try-on applications.

Abstract

The paper aims to address the lack of photorealistic virtual try-on models for accessories such as jewelry and watches, which are particularly relevant for online retail applications. While existing virtual try-on models focus primarily on clothing items, there is a gap in the market for accessories. This research explores the application of techniques from 2D virtual try-on models for clothing, such as VITON-HD, and integrates them with other computer vision models, notably MediaPipe Hand Landmarker. Drawing on existing literature, the study customizes and retrains a unique model using accessory-specific data and network architecture modifications to assess the feasibility of extending virtual try-on technology to accessories. Results demonstrate improved location prediction compared to the original model for clothes, even with a small dataset. This underscores the model's potential with larger datasets exceeding 10,000 images, paving the way for future research in virtual accessory try-on applications.

GlamTry: Advancing Virtual Try-On for High-End Accessories

TL;DR

Abstract

Paper Structure (24 sections, 3 equations, 21 figures)

This paper contains 24 sections, 3 equations, 21 figures.

Introduction
Literature Review
Dataset
Data Collection
Web Scrapping
Kaggle
Data Pre-Processing
Human Parsing
OpenPose
MediaPipe
Accessories-mask
Agnostic-mask and Human-agnostic
Method
Experiment
Baseline Model
...and 9 more sections

Figures (21)

Figure 1: Overview of VITON-HD dataset.
Figure 2: The results of human parsing obtained from different models.
Figure 3: The predicted 21 key points of MediaPipe Hand Landmarker.
Figure 4: (a) The original image of people wearing watch and its human Parsing output. (b) The original image of people wearing watches and its OpenPose output. (c) The original image of people wearing watches and its MediaPipe output. (d) The original image of the target watch from Kaggle and its mask.
Figure 5: (a) The initial unsuccessful prediction of the human-agnostic mask. (b) The enhanced human-agnostic mask achieved using our algorithm. (c) Arm masks with bounding boxes overlaid. (d) Agnostic mask highlighting the targeted watch region. (e) The original image with predicted MediaPipe Hand Landmarker hand pose. (f) The predicted watch location using the midpoint algorithm. (g) An instance of unsuccessful watch mask prediction. (h) An improved watch mask prediction by the midpoint algorithm.
...and 16 more figures

GlamTry: Advancing Virtual Try-On for High-End Accessories

TL;DR

Abstract

GlamTry: Advancing Virtual Try-On for High-End Accessories

Authors

TL;DR

Abstract

Table of Contents

Figures (21)