ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model
Binghui Chen, Wenyu Li, Yifeng Geng, Xuansong Xie, Wangmeng Zuo
TL;DR
This work addresses generating hyper-realistic advertising images of a human wearing a user-specified shoe while preserving the shoe's identity. It introduces ShoeModel, a three-module diffusion-based pipeline consisting of Wearable-area Detection (WD), Leg-pose Synthesis (LpS), and Shoe-wearing Image Generation (SW), with a staged training strategy. Two new datasets support training: a wearable-area detection dataset and a large shoe-leg dataset for pose-conditioned generation. Quantitative and qualitative experiments show ShoeModel outperforms diffusion-based and inpainting baselines in image realism, identity preservation, and human-shoe interaction plausibility, highlighting its practical impact for automated e-commerce advertising content generation. The approach enables consistent, realistic shoe-advertising imagery and can inspire further research on object-identity preservation and object-human interaction in controllable diffusion systems.
Abstract
With the development of the large-scale diffusion model, Artificial Intelligence Generated Content (AIGC) techniques are popular recently. However, how to truly make it serve our daily lives remains an open question. To this end, in this paper, we focus on employing AIGC techniques in one filed of E-commerce marketing, i.e., generating hyper-realistic advertising images for displaying user-specified shoes by human. Specifically, we propose a shoe-wearing system, called Shoe-Model, to generate plausible images of human legs interacting with the given shoes. It consists of three modules: (1) shoe wearable-area detection module (WD), (2) leg-pose synthesis module (LpS) and the final (3) shoe-wearing image generation module (SW). Them three are performed in ordered stages. Compared to baselines, our ShoeModel is shown to generalize better to different type of shoes and has ability of keeping the ID-consistency of the given shoes, as well as automatically producing reasonable interactions with human. Extensive experiments show the effectiveness of our proposed shoe-wearing system. Figure 1 shows the input and output examples of our ShoeModel.
