Fashionability-Enhancing Outfit Image Editing with Conditional Diffusion Models
Qice Qin, Yuki Hirakawa, Ryotaro Shimizu, Takuya Furusawa, Edgar Simo-Serra
TL;DR
This work tackles enhancing fashionability in outfit image editing without external prompts by integrating a diffusion-based generator with segmentation-conditioned controls and a classifier-guided fashionability objective. It introduces two expert-annotated datasets (OpenSkill-based and 5-Dimension) to train and evaluate a Mid-U guidance system that steers latent diffusion toward more fashionable outputs while preserving body shape and identity. Empirical results show significant gains over the Fashion++ baseline in both quantitative fashionability predictions and qualitative image quality, supported by a user study and detailed failure analyses. The approach offers a practical, interpretable framework for automatic fashionability enhancement with potential impact on virtual try-on, fashion design, and e-commerce workflows.
Abstract
Image generation in the fashion domain has predominantly focused on preserving body characteristics or following input prompts, but little attention has been paid to improving the inherent fashionability of the output images. This paper presents a novel diffusion model-based approach that generates fashion images with improved fashionability while maintaining control over key attributes. Key components of our method include: 1) fashionability enhancement, which ensures that the generated images are more fashionable than the input; 2) preservation of body characteristics, encouraging the generated images to maintain the original shape and proportions of the input; and 3) automatic fashion optimization, which does not rely on manual input or external prompts. We also employ two methods to collect training data for guidance while generating and evaluating the images. In particular, we rate outfit images using fashionability scores annotated by multiple fashion experts through OpenSkill-based and five critical aspect-based pairwise comparisons. These methods provide complementary perspectives for assessing and improving the fashionability of the generated images. The experimental results show that our approach outperforms the baseline Fashion++ in generating images with superior fashionability, demonstrating its effectiveness in producing more stylish and appealing fashion images.
