Fashion Image-to-Image Translation for Complementary Item Retrieval
Matteo Attimonelli, Claudio Pomo, Dietmar Jannach, Tommaso Di Noia
TL;DR
GeCo tackles top-bottom fashion item retrieval under data scarcity by a two-stage framework that first generates compatible bottom templates from tops with a conditioned Pix2Pix-based generator (CIGM) and then performs compatibility-aware retrieval (GeCo) within a Composed Image Retrieval setting. The approach leverages a joint objective that combines Bayesian preferences and contrastive learning to align top, template, and bottom embeddings, enabling robust ranking and compatibility assessment. Extensive experiments on FashionVC, ExpFashion, and FashionTaobaoTB show GeCo outperforms state-of-the-art baselines, with particular gains in low-data regimes and when high-quality templates are available. The work also releases FashionTaobaoTB to spur future research, highlighting the value of realistic generated templates for effective composed image retrieval in fashion.
Abstract
The increasing demand for online fashion retail has boosted research in fashion compatibility modeling and item retrieval, focusing on matching user queries (textual descriptions or reference images) with compatible fashion items. A key challenge is top-bottom retrieval, where precise compatibility modeling is essential. Traditional methods, often based on Bayesian Personalized Ranking (BPR), have shown limited performance. Recent efforts have explored using generative models in compatibility modeling and item retrieval, where generated images serve as additional inputs. However, these approaches often overlook the quality of generated images, which could be crucial for model performance. Additionally, generative models typically require large datasets, posing challenges when such data is scarce. To address these issues, we introduce the Generative Compatibility Model (GeCo), a two-stage approach that improves fashion image retrieval through paired image-to-image translation. First, the Complementary Item Generation Model (CIGM), built on Conditional Generative Adversarial Networks (GANs), generates target item images (e.g., bottoms) from seed items (e.g., tops), offering conditioning signals for retrieval. These generated samples are then integrated into GeCo, enhancing compatibility modeling and retrieval accuracy. Evaluations on three datasets show that GeCo outperforms state-of-the-art baselines. Key contributions include: (i) the GeCo model utilizing paired image-to-image translation within the Composed Image Retrieval framework, (ii) comprehensive evaluations on benchmark datasets, and (iii) the release of a new Fashion Taobao dataset designed for top-bottom retrieval, promoting further research.
