Table of Contents
Fetching ...

Retail-786k: a Large-Scale Dataset for Visual Entity Matching

Bianca Lamm, Janis Keuper

TL;DR

This paper introduces the first publicly available large-scale dataset for"visual entity matching", based on a production level use case in the retail domain, and shows that the proposed" visual entity matching"constitutes a novel learning problem which can not sufficiently be solved using standard image based classification and retrieval algorithms.

Abstract

Entity Matching (EM) defines the task of learning to group objects by transferring semantic concepts from example groups (=entities) to unseen data. Despite the general availability of image data in the context of many EM-problems, most currently available EM-algorithms solely rely on (textual) meta data. In this paper, we introduce the first publicly available large-scale dataset for "visual entity matching", based on a production level use case in the retail domain. Using scanned advertisement leaflets, collected over several years from different European retailers, we provide a total of ~786k manually annotated, high resolution product images containing ~18k different individual retail products which are grouped into ~3k entities. The annotation of these product entities is based on a price comparison task, where each entity forms an equivalence class of comparable products. Following on a first baseline evaluation, we show that the proposed "visual entity matching" constitutes a novel learning problem which can not sufficiently be solved using standard image based classification and retrieval algorithms. Instead, novel approaches which allow to transfer example based visual equivalent classes to new data are needed to address the proposed problem. The aim of this paper is to provide a benchmark for such algorithms. Information about the dataset, evaluation code and download instructions are provided under https://www.retail-786k.org/.

Retail-786k: a Large-Scale Dataset for Visual Entity Matching

TL;DR

This paper introduces the first publicly available large-scale dataset for"visual entity matching", based on a production level use case in the retail domain, and shows that the proposed" visual entity matching"constitutes a novel learning problem which can not sufficiently be solved using standard image based classification and retrieval algorithms.

Abstract

Entity Matching (EM) defines the task of learning to group objects by transferring semantic concepts from example groups (=entities) to unseen data. Despite the general availability of image data in the context of many EM-problems, most currently available EM-algorithms solely rely on (textual) meta data. In this paper, we introduce the first publicly available large-scale dataset for "visual entity matching", based on a production level use case in the retail domain. Using scanned advertisement leaflets, collected over several years from different European retailers, we provide a total of ~786k manually annotated, high resolution product images containing ~18k different individual retail products which are grouped into ~3k entities. The annotation of these product entities is based on a price comparison task, where each entity forms an equivalence class of comparable products. Following on a first baseline evaluation, we show that the proposed "visual entity matching" constitutes a novel learning problem which can not sufficiently be solved using standard image based classification and retrieval algorithms. Instead, novel approaches which allow to transfer example based visual equivalent classes to new data are needed to address the proposed problem. The aim of this paper is to provide a benchmark for such algorithms. Information about the dataset, evaluation code and download instructions are provided under https://www.retail-786k.org/.
Paper Structure (30 sections, 40 figures, 5 tables)

This paper contains 30 sections, 40 figures, 5 tables.

Figures (40)

  • Figure 1: In the context of retail products, the term "visual entity matching" refers to the task of linking individual product images from diverse sources to a semantic product grouping. Here all images show different products from the same entity which is defined by the fact that single images are used as "placeholders" by retailers to promote all products of the entity. For higher resolution version, refer to Figure \ref{['fig:appendix_visual_abstract_explain']} in the appendix.
  • Figure 2: Inside the dashed rectangle, samples from a representative entity show the typical intra-entity variations. In contrast to this, there are samples having a strong visually similarity although they belong to different entities. These samples are framed by the solid rectangle, circle, and polygon.
  • Figure 3: Illustration of retrieval samples showing the difficulty of matching products. The green-framed images belong to the same entity as the query image. The red-framed images belong to another entity. The sub-captions describe the difference between the query image and the false matched images. The images are presented in full resolution in Figure \ref{['fig:appendix_image_retrieval_Chips']} and Figure \ref{['fig:appendix_image_retrieval_Apfelmus']} in the appendix.
  • Figure 4: Illustration of two instances from different groupings.
  • Figure 5: Entity histograms by train and test split of the dataset.
  • ...and 35 more figures