Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition

Jielin Qiu; William Han; Winfred Wang; Zhengyuan Yang; Linjie Li; Jianfeng Wang; Christos Faloutsos; Lei Li; Lijuan Wang

Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition

Jielin Qiu, William Han, Winfred Wang, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Christos Faloutsos, Lei Li, Lijuan Wang

TL;DR

This work introduces Entity6K, a comprehensive dataset for real-world entity recognition, featuring 5,700 entities across 26 categories, each supported by 5 human-verified images with annotations, addressing a gap in existing datasets.

Abstract

Open-domain real-world entity recognition is essential yet challenging, involving identifying various entities in diverse environments. The lack of a suitable evaluation dataset has been a major obstacle in this field due to the vast number of entities and the extensive human effort required for data curation. We introduce Entity6K, a comprehensive dataset for real-world entity recognition, featuring 5,700 entities across 26 categories, each supported by 5 human-verified images with annotations. Entity6K offers a diverse range of entity names and categorizations, addressing a gap in existing datasets. We conducted benchmarks with existing models on tasks like image captioning, object detection, zero-shot classification, and dense captioning to demonstrate Entity6K's effectiveness in evaluating models' entity recognition capabilities. We believe Entity6K will be a valuable resource for advancing accurate entity recognition in open-domain settings.

Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition

TL;DR

Abstract

Paper Structure (62 sections, 4 figures, 16 tables)

This paper contains 62 sections, 4 figures, 16 tables.

Introduction
Related Work
Open-domain Entity Recognition
Zero-Shot Image Classification
Object Detection
Entity6K Dataset
Data Acquisition
Entity List
Data Collection and Licenses
Fidelity Control
Human Annotation
Bounding Box Annotation
Textual Description Annotation
Statistics of the Dataset
Experimental Settings
...and 47 more sections

Figures (4)

Figure 1: Comparison between Entity6K and existing datasets, where existing datasets may only contain a single large entity, ambiguous entity name, no bounding box, or short/no captions. However, our dataset contains entities in complex environments, with specific names, and human-labeled bounding boxes and captions.
Figure 2: Examples of the collected data in the Entity6K dataset, where each image is associated with the entity region (bounding box) and the textual descriptions, centering on the specific entity.
Figure 3: Statistics of the entities in each category.
Figure 4: Examples of the collected data in the Entity6K dataset, where each image is associated with the entity region (bounding box) and the textual descriptions, centering on the specific entity.

Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition

TL;DR

Abstract

Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (4)