Table of Contents
Fetching ...

Garment Attribute Manipulation with Multi-level Attention

Vittorio Casula, Lorenzo Berlincioni, Luca Cultrera, Federico Becattini, Chiara Pero, Carmen Bisogni, Marco Bertini, Alberto Del Bimbo

TL;DR

GAMMA (Garment Attribute Manipulation with Multi-level Attention) is a novel framework that integrates attribute-disentangled representations with a multi-stage attention-based architecture that enables targeted manipulation of fashion image attributes, allowing users to refine their searches with high accuracy.

Abstract

In the rapidly evolving field of online fashion shopping, the need for more personalized and interactive image retrieval systems has become paramount. Existing methods often struggle with precisely manipulating specific garment attributes without inadvertently affecting others. To address this challenge, we propose GAMMA (Garment Attribute Manipulation with Multi-level Attention), a novel framework that integrates attribute-disentangled representations with a multi-stage attention-based architecture. GAMMA enables targeted manipulation of fashion image attributes, allowing users to refine their searches with high accuracy. By leveraging a dual-encoder Transformer and memory block, our model achieves state-of-the-art performance on popular datasets like Shopping100k and DeepFashion.

Garment Attribute Manipulation with Multi-level Attention

TL;DR

GAMMA (Garment Attribute Manipulation with Multi-level Attention) is a novel framework that integrates attribute-disentangled representations with a multi-stage attention-based architecture that enables targeted manipulation of fashion image attributes, allowing users to refine their searches with high accuracy.

Abstract

In the rapidly evolving field of online fashion shopping, the need for more personalized and interactive image retrieval systems has become paramount. Existing methods often struggle with precisely manipulating specific garment attributes without inadvertently affecting others. To address this challenge, we propose GAMMA (Garment Attribute Manipulation with Multi-level Attention), a novel framework that integrates attribute-disentangled representations with a multi-stage attention-based architecture. GAMMA enables targeted manipulation of fashion image attributes, allowing users to refine their searches with high accuracy. By leveraging a dual-encoder Transformer and memory block, our model achieves state-of-the-art performance on popular datasets like Shopping100k and DeepFashion.
Paper Structure (19 sections, 11 equations, 4 figures, 5 tables)

This paper contains 19 sections, 11 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Illustration of our model. The disentangled representation of the query image, $r_n$, is provided to GAMMA along with the manipulation indicator vector $i$ and the memory of prototype features $M$.
  • Figure 2: GAMMA internal architecture illustration.
  • Figure 3: Qualitative results over the Shopping100k testset. Left most column shows the query image. The rest are sorted retrieved results of our model. Green outline represent the ground truth correct answers.
  • Figure 4: Failure cases over the Shopping100k testset. Left column shows the query image, the others are ranked results. Green outline represents the ground truth correct answers.