Table of Contents
Fetching ...

Fashion Image-to-Image Translation for Complementary Item Retrieval

Matteo Attimonelli, Claudio Pomo, Dietmar Jannach, Tommaso Di Noia

TL;DR

GeCo tackles top-bottom fashion item retrieval under data scarcity by a two-stage framework that first generates compatible bottom templates from tops with a conditioned Pix2Pix-based generator (CIGM) and then performs compatibility-aware retrieval (GeCo) within a Composed Image Retrieval setting. The approach leverages a joint objective that combines Bayesian preferences and contrastive learning to align top, template, and bottom embeddings, enabling robust ranking and compatibility assessment. Extensive experiments on FashionVC, ExpFashion, and FashionTaobaoTB show GeCo outperforms state-of-the-art baselines, with particular gains in low-data regimes and when high-quality templates are available. The work also releases FashionTaobaoTB to spur future research, highlighting the value of realistic generated templates for effective composed image retrieval in fashion.

Abstract

The increasing demand for online fashion retail has boosted research in fashion compatibility modeling and item retrieval, focusing on matching user queries (textual descriptions or reference images) with compatible fashion items. A key challenge is top-bottom retrieval, where precise compatibility modeling is essential. Traditional methods, often based on Bayesian Personalized Ranking (BPR), have shown limited performance. Recent efforts have explored using generative models in compatibility modeling and item retrieval, where generated images serve as additional inputs. However, these approaches often overlook the quality of generated images, which could be crucial for model performance. Additionally, generative models typically require large datasets, posing challenges when such data is scarce. To address these issues, we introduce the Generative Compatibility Model (GeCo), a two-stage approach that improves fashion image retrieval through paired image-to-image translation. First, the Complementary Item Generation Model (CIGM), built on Conditional Generative Adversarial Networks (GANs), generates target item images (e.g., bottoms) from seed items (e.g., tops), offering conditioning signals for retrieval. These generated samples are then integrated into GeCo, enhancing compatibility modeling and retrieval accuracy. Evaluations on three datasets show that GeCo outperforms state-of-the-art baselines. Key contributions include: (i) the GeCo model utilizing paired image-to-image translation within the Composed Image Retrieval framework, (ii) comprehensive evaluations on benchmark datasets, and (iii) the release of a new Fashion Taobao dataset designed for top-bottom retrieval, promoting further research.

Fashion Image-to-Image Translation for Complementary Item Retrieval

TL;DR

GeCo tackles top-bottom fashion item retrieval under data scarcity by a two-stage framework that first generates compatible bottom templates from tops with a conditioned Pix2Pix-based generator (CIGM) and then performs compatibility-aware retrieval (GeCo) within a Composed Image Retrieval setting. The approach leverages a joint objective that combines Bayesian preferences and contrastive learning to align top, template, and bottom embeddings, enabling robust ranking and compatibility assessment. Extensive experiments on FashionVC, ExpFashion, and FashionTaobaoTB show GeCo outperforms state-of-the-art baselines, with particular gains in low-data regimes and when high-quality templates are available. The work also releases FashionTaobaoTB to spur future research, highlighting the value of realistic generated templates for effective composed image retrieval in fashion.

Abstract

The increasing demand for online fashion retail has boosted research in fashion compatibility modeling and item retrieval, focusing on matching user queries (textual descriptions or reference images) with compatible fashion items. A key challenge is top-bottom retrieval, where precise compatibility modeling is essential. Traditional methods, often based on Bayesian Personalized Ranking (BPR), have shown limited performance. Recent efforts have explored using generative models in compatibility modeling and item retrieval, where generated images serve as additional inputs. However, these approaches often overlook the quality of generated images, which could be crucial for model performance. Additionally, generative models typically require large datasets, posing challenges when such data is scarce. To address these issues, we introduce the Generative Compatibility Model (GeCo), a two-stage approach that improves fashion image retrieval through paired image-to-image translation. First, the Complementary Item Generation Model (CIGM), built on Conditional Generative Adversarial Networks (GANs), generates target item images (e.g., bottoms) from seed items (e.g., tops), offering conditioning signals for retrieval. These generated samples are then integrated into GeCo, enhancing compatibility modeling and retrieval accuracy. Evaluations on three datasets show that GeCo outperforms state-of-the-art baselines. Key contributions include: (i) the GeCo model utilizing paired image-to-image translation within the Composed Image Retrieval framework, (ii) comprehensive evaluations on benchmark datasets, and (iii) the release of a new Fashion Taobao dataset designed for top-bottom retrieval, promoting further research.
Paper Structure (21 sections, 15 equations, 10 figures, 4 tables)

This paper contains 21 sections, 15 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: In the proposed architecture the CIGM model generates bottom templates. Subsequently, the GeCo model leverages the top, the generated template, and the candidate bottom images to evaluate their compatibility. This approach facilitates both compatibility modeling and complementary item retrieval tasks.
  • Figure 2: An example of generated images from DBLP:conf/mspn/El-KaddouryMH19 illustrates the differences in generation quality: (a) presents images generated by a VAE, while (b) showcases images sampled from a GAN.
  • Figure 3: Pix2Pix original generator DBLP:conf/cvpr/IsolaZZE17.
  • Figure 4: Complementary Item Generation Model.
  • Figure 5: Top: conditioning tops. Middle: ground-truth bottoms. Bottom: generated bottoms with the proposed generative model.
  • ...and 5 more figures