Table of Contents
Fetching ...

Shopping Queries Image Dataset (SQID): An Image-Enriched ESCI Dataset for Exploring Multimodal Learning in Product Search

Marie Al Ghossein, Ching-Wei Chen, Jason Tang

TL;DR

The paper addresses improving product search by leveraging multimodal data in ecommerce. It introduces SQID, an image-enriched extension of the Amazon SQD, adding image URLs and CLIP-based visual embeddings for roughly 190k products alongside text query embeddings. Using pretrained models, it demonstrates that combining text and image signals improves ranking performance on the ESCI Task 1 benchmark, as reflected in $NDCG$ gains. The dataset provides a public resource for benchmarking and advancing multimodal learning in product search, with potential for fine-tuning pretrained models and end-to-end retrieval studies.

Abstract

Recent advances in the fields of Information Retrieval and Machine Learning have focused on improving the performance of search engines to enhance the user experience, especially in the world of online shopping. The focus has thus been on leveraging cutting-edge learning techniques and relying on large enriched datasets. This paper introduces the Shopping Queries Image Dataset (SQID), an extension of the Amazon Shopping Queries Dataset enriched with image information associated with 190,000 products. By integrating visual information, SQID facilitates research around multimodal learning techniques that can take into account both textual and visual information for improving product search and ranking. We also provide experimental results leveraging SQID and pretrained models, showing the value of using multimodal data for search and ranking. SQID is available at: https://github.com/Crossing-Minds/shopping-queries-image-dataset.

Shopping Queries Image Dataset (SQID): An Image-Enriched ESCI Dataset for Exploring Multimodal Learning in Product Search

TL;DR

The paper addresses improving product search by leveraging multimodal data in ecommerce. It introduces SQID, an image-enriched extension of the Amazon SQD, adding image URLs and CLIP-based visual embeddings for roughly 190k products alongside text query embeddings. Using pretrained models, it demonstrates that combining text and image signals improves ranking performance on the ESCI Task 1 benchmark, as reflected in gains. The dataset provides a public resource for benchmarking and advancing multimodal learning in product search, with potential for fine-tuning pretrained models and end-to-end retrieval studies.

Abstract

Recent advances in the fields of Information Retrieval and Machine Learning have focused on improving the performance of search engines to enhance the user experience, especially in the world of online shopping. The focus has thus been on leveraging cutting-edge learning techniques and relying on large enriched datasets. This paper introduces the Shopping Queries Image Dataset (SQID), an extension of the Amazon Shopping Queries Dataset enriched with image information associated with 190,000 products. By integrating visual information, SQID facilitates research around multimodal learning techniques that can take into account both textual and visual information for improving product search and ranking. We also provide experimental results leveraging SQID and pretrained models, showing the value of using multimodal data for search and ranking. SQID is available at: https://github.com/Crossing-Minds/shopping-queries-image-dataset.
Paper Structure (12 sections, 2 figures, 2 tables)

This paper contains 12 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Different variants of a product on Amazon
  • Figure 2: NDCG of combinations of ranking approaches, mixing both text and image data for query-product ranking. This is done by either combining query-product similarities or directly combining ranking lists, using a weighted average, where $w$ and $(1-w)$ are the weights associated respectively with $m_{1}$ and $m_{2}$ and $w$ is represented on the x-axis.