A Dataset for Named Entity Recognition and Relation Extraction from Art-historical Image Descriptions

Stefanie Schneider; Miriam Göldl; Julian Stalter; Ricarda Vollmer

A Dataset for Named Entity Recognition and Relation Extraction from Art-historical Image Descriptions

Stefanie Schneider, Miriam Göldl, Julian Stalter, Ricarda Vollmer

TL;DR

FRAME is introduced, a manually annotated dataset of art-historical image descriptions for Named Entity Recognition (NER) and Relation Extraction (RE) that can be used to benchmark and fine-tune NER and RE systems, including zero- and few-shot setups with Large Language Models (LLMs).

Abstract

This paper introduces FRAME (Fine-grained Recognition of Art-historical Metadata and Entities), a manually annotated dataset of art-historical image descriptions for Named Entity Recognition (NER) and Relation Extraction (RE). Descriptions were collected from museum catalogs, auction listings, open-access platforms, and scholarly databases, then filtered to ensure that each text focuses on a single artwork and contains explicit statements about its material, composition, or iconography. FRAME provides stand-off annotations in three layers: a metadata layer for object-level properties, a content layer for depicted subjects and motifs, and a co-reference layer linking repeated mentions. Across layers, entity spans are labeled with 37 types and connected by typed RE links between mentions. Entity types are aligned with Wikidata to support Named Entity Linking (NEL) and downstream knowledge-graph construction. The dataset is released as UIMA XMI Common Analysis Structure (CAS) files with accompanying images and bibliographic metadata, and can be used to benchmark and fine-tune NER and RE systems, including zero- and few-shot setups with Large Language Models (LLMs).

A Dataset for Named Entity Recognition and Relation Extraction from Art-historical Image Descriptions

TL;DR

Abstract

Paper Structure (17 sections, 3 figures, 3 tables)

This paper contains 17 sections, 3 figures, 3 tables.

Background & Summary
Methods
Data Collection
Data Annotation
Entity Type Selection
Annotation Procedure
Annotation Challenges
Data Validation
Data Record
Data Overview
Technical Validation
Data Acquisition
Data Post-processing
Usage Notes
Data Availability
...and 2 more sections

Figures (3)

Figure 1: Each record in the dataset includes (1) the referenced artwork image, (2) basic artwork metadata, and (3) an art-historical text excerpt labeled with NER and RE annotations.
Figure 2: Our dataset creation process involves a modular, multi-stage pipeline, integrating both manual and automated components.
Figure 3: Four-step annotation procedure. (a) First, the text is read in full to obtain an overview, without creating annotations. (b) Second, expressions belonging to the metadata layer are annotated. (c) Third, expressions in the content layer are annotated. (d) Fourth and finally, co-references are annotated.

A Dataset for Named Entity Recognition and Relation Extraction from Art-historical Image Descriptions

TL;DR

Abstract

A Dataset for Named Entity Recognition and Relation Extraction from Art-historical Image Descriptions

Authors

TL;DR

Abstract

Table of Contents

Figures (3)