Table of Contents
Fetching ...

Shining Yourself: High-Fidelity Ornaments Virtual Try-on with Diffusion Model

Yingmao Miao, Zhanpeng Huang, Rui Han, Zibin Wang, Chenhao Lin, Chao Shen

TL;DR

This work introduces a diffusion-based framework for ornament virtual try-on (bracelets, rings, earrings, necklaces) that emphasizes high-fidelity identity preservation and geometric structure under varying poses and scales. It jointly learns a pose-aware wearing mask through an iterative refinement process and enforces geometry-aware attention via a Mask-guided Attention mechanism, leveraging a ReferenceNet to inject reference ornament features into the denoising network. By combining an iterative wearing-mask prediction with geometry-aware attention, the method achieves realistic ornament wear with strong identity preservation across diverse ornament types and poses. The approach demonstrates superior performance against garment-focused and insertion baselines, offering practical potential for advertising and e-commerce, while acknowledging lighting biases and orientation challenges as areas for future work.

Abstract

While virtual try-on for clothes and shoes with diffusion models has gained attraction, virtual try-on for ornaments, such as bracelets, rings, earrings, and necklaces, remains largely unexplored. Due to the intricate tiny patterns and repeated geometric sub-structures in most ornaments, it is much more difficult to guarantee identity and appearance consistency under large pose and scale variances between ornaments and models. This paper proposes the task of virtual try-on for ornaments and presents a method to improve the geometric and appearance preservation of ornament virtual try-ons. Specifically, we estimate an accurate wearing mask to improve the alignments between ornaments and models in an iterative scheme alongside the denoising process. To preserve structure details, we further regularize attention layers to map the reference ornament mask to the wearing mask in an implicit way. Experimental results demonstrate that our method successfully wears ornaments from reference images onto target models, handling substantial differences in scale and pose while preserving identity and achieving realistic visual effects.

Shining Yourself: High-Fidelity Ornaments Virtual Try-on with Diffusion Model

TL;DR

This work introduces a diffusion-based framework for ornament virtual try-on (bracelets, rings, earrings, necklaces) that emphasizes high-fidelity identity preservation and geometric structure under varying poses and scales. It jointly learns a pose-aware wearing mask through an iterative refinement process and enforces geometry-aware attention via a Mask-guided Attention mechanism, leveraging a ReferenceNet to inject reference ornament features into the denoising network. By combining an iterative wearing-mask prediction with geometry-aware attention, the method achieves realistic ornament wear with strong identity preservation across diverse ornament types and poses. The approach demonstrates superior performance against garment-focused and insertion baselines, offering practical potential for advertising and e-commerce, while acknowledging lighting biases and orientation challenges as areas for future work.

Abstract

While virtual try-on for clothes and shoes with diffusion models has gained attraction, virtual try-on for ornaments, such as bracelets, rings, earrings, and necklaces, remains largely unexplored. Due to the intricate tiny patterns and repeated geometric sub-structures in most ornaments, it is much more difficult to guarantee identity and appearance consistency under large pose and scale variances between ornaments and models. This paper proposes the task of virtual try-on for ornaments and presents a method to improve the geometric and appearance preservation of ornament virtual try-ons. Specifically, we estimate an accurate wearing mask to improve the alignments between ornaments and models in an iterative scheme alongside the denoising process. To preserve structure details, we further regularize attention layers to map the reference ornament mask to the wearing mask in an implicit way. Experimental results demonstrate that our method successfully wears ornaments from reference images onto target models, handling substantial differences in scale and pose while preserving identity and achieving realistic visual effects.

Paper Structure

This paper contains 18 sections, 10 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Shining Yourself. We propose the virtual try-on task for ornaments including bracelets, rings, earrings, and necklaces for the first time. Our method achieves realistic virtual try-on results and high-fidelity identity preservation of ornament using pose-aware mask prediction and mask-guided attention. Project Page: https://shiningyourself.github.io/
  • Figure 2: The overview of our method. a) In training, given reference ornament and model images and masks, our method concatenates ornament and masked model images as input to the ReferenceNet branch, which extracts features to predict wearing mask in an iterative way. The extracted features are also injected into the denoising U-Net to improve details generation. b) We enforce the attention layers to preserve structure details by formulating the layers to map the reference ornament mask to the ground truth wearing mask in an implicit way rather than directly imposing the mask onto attention maps.
  • Figure 3: Visual comparison between previous methods and ours. No existing method could keep appearance and structure consistent, especially geometric details and numbers of components in ornaments. Our method preserves both details and identity and achieves high-quality and high-fidelity fitting results.
  • Figure 4: Virtual try-on results on other categories including bracelets, rings, necklaces, and earrings.
  • Figure 5: The visual comparisons of our models with different module configurations. The full model archives the best results with the proposed two modules.
  • ...and 5 more figures