Shining Yourself: High-Fidelity Ornaments Virtual Try-on with Diffusion Model
Yingmao Miao, Zhanpeng Huang, Rui Han, Zibin Wang, Chenhao Lin, Chao Shen
TL;DR
This work introduces a diffusion-based framework for ornament virtual try-on (bracelets, rings, earrings, necklaces) that emphasizes high-fidelity identity preservation and geometric structure under varying poses and scales. It jointly learns a pose-aware wearing mask through an iterative refinement process and enforces geometry-aware attention via a Mask-guided Attention mechanism, leveraging a ReferenceNet to inject reference ornament features into the denoising network. By combining an iterative wearing-mask prediction with geometry-aware attention, the method achieves realistic ornament wear with strong identity preservation across diverse ornament types and poses. The approach demonstrates superior performance against garment-focused and insertion baselines, offering practical potential for advertising and e-commerce, while acknowledging lighting biases and orientation challenges as areas for future work.
Abstract
While virtual try-on for clothes and shoes with diffusion models has gained attraction, virtual try-on for ornaments, such as bracelets, rings, earrings, and necklaces, remains largely unexplored. Due to the intricate tiny patterns and repeated geometric sub-structures in most ornaments, it is much more difficult to guarantee identity and appearance consistency under large pose and scale variances between ornaments and models. This paper proposes the task of virtual try-on for ornaments and presents a method to improve the geometric and appearance preservation of ornament virtual try-ons. Specifically, we estimate an accurate wearing mask to improve the alignments between ornaments and models in an iterative scheme alongside the denoising process. To preserve structure details, we further regularize attention layers to map the reference ornament mask to the wearing mask in an implicit way. Experimental results demonstrate that our method successfully wears ornaments from reference images onto target models, handling substantial differences in scale and pose while preserving identity and achieving realistic visual effects.
