Table of Contents
Fetching ...

Fashion Recommendation: Outfit Compatibility using GNN

Samaksh Gulati

TL;DR

This work investigates graph-based representations for fashion Outfit compatibility using two GNN-based frameworks: NGNN, a node-wise graph neural network, and HGNN, a hypergraph neural network. It evaluates these models on the Polyvore dataset across two tasks—Fill-In-The-Blank (FITB) and Compatibility Prediction (AUC)—and explores multimodal embeddings from images and text, including Vision Transformer and InceptionV3 features. The results indicate HGNN provides a modest improvement over NGNN, particularly in AUC, and that multi-modal embeddings yield the strongest performance gains. Overall, the study demonstrates that graph-based models, especially hypergraph-based ones, can effectively capture higher-order item interactions to automate and improve outfit recommendations on fashion platforms.

Abstract

Numerous industries have benefited from the use of machine learning and fashion in industry is no exception. By gaining a better understanding of what makes a good outfit, companies can provide useful product recommendations to their users. In this project, we follow two existing approaches that employ graphs to represent outfits and use modified versions of the Graph neural network (GNN) frameworks. Both Node-wise Graph Neural Network (NGNN) and Hypergraph Neural Network aim to score a set of items according to the outfit compatibility of items. The data used is the Polyvore Dataset which consists of curated outfits with product images and text descriptions for each product in an outfit. We recreate the analysis on a subset of this data and compare the two existing models on their performance on two tasks Fill in the blank (FITB): finding an item that completes an outfit, and Compatibility prediction: estimating compatibility of different items grouped as an outfit. We can replicate the results directionally and find that HGNN does have a slightly better performance on both tasks. On top of replicating the results of the two papers we also tried to use embeddings generated from a vision transformer and witness enhanced prediction accuracy across the board

Fashion Recommendation: Outfit Compatibility using GNN

TL;DR

This work investigates graph-based representations for fashion Outfit compatibility using two GNN-based frameworks: NGNN, a node-wise graph neural network, and HGNN, a hypergraph neural network. It evaluates these models on the Polyvore dataset across two tasks—Fill-In-The-Blank (FITB) and Compatibility Prediction (AUC)—and explores multimodal embeddings from images and text, including Vision Transformer and InceptionV3 features. The results indicate HGNN provides a modest improvement over NGNN, particularly in AUC, and that multi-modal embeddings yield the strongest performance gains. Overall, the study demonstrates that graph-based models, especially hypergraph-based ones, can effectively capture higher-order item interactions to automate and improve outfit recommendations on fashion platforms.

Abstract

Numerous industries have benefited from the use of machine learning and fashion in industry is no exception. By gaining a better understanding of what makes a good outfit, companies can provide useful product recommendations to their users. In this project, we follow two existing approaches that employ graphs to represent outfits and use modified versions of the Graph neural network (GNN) frameworks. Both Node-wise Graph Neural Network (NGNN) and Hypergraph Neural Network aim to score a set of items according to the outfit compatibility of items. The data used is the Polyvore Dataset which consists of curated outfits with product images and text descriptions for each product in an outfit. We recreate the analysis on a subset of this data and compare the two existing models on their performance on two tasks Fill in the blank (FITB): finding an item that completes an outfit, and Compatibility prediction: estimating compatibility of different items grouped as an outfit. We can replicate the results directionally and find that HGNN does have a slightly better performance on both tasks. On top of replicating the results of the two papers we also tried to use embeddings generated from a vision transformer and witness enhanced prediction accuracy across the board
Paper Structure (23 sections, 5 figures, 2 tables)

This paper contains 23 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: NGNN Framework
  • Figure 2: NGNN Training Flow
  • Figure 3: Comparision between HGNN and GNN
  • Figure 4: Sample Hypergraph in the Polyvore data
  • Figure 5: Hypergraph Framework