Table of Contents
Fetching ...

Wiki-TabNER: Integrating Named Entity Recognition into Wikipedia Tables

Aneta Koleva, Martin Ringsquandl, Ahmed Hatem, Thomas Runkler, Volker Tresp

TL;DR

This work introduces Wiki-TabNER, a benchmark dataset that brings real-world Wikipedia tables into NER evaluation by preserving multi-entity cells and annotating entities with DBpedia semantic classes and Wikidata IDs. It details dataset construction from the WikiTables corpus, including table filtering, entity extraction, semantic annotation, and span-based labeling embedded in the table structure. The paper also proposes a prompting-based evaluation framework for large language models to perform within-table NER, accompanied by qualitative analysis and ablation studies that reveal challenges in type prediction and label granularity. Findings show that while in-context learning improves NER within tables, the task remains difficult for current LLMs, underscoring the need for robust table NER benchmarks; the dataset is released to foster further research, with future work including multi-label classification and EL integration.

Abstract

Interest in solving table interpretation tasks has grown over the years, yet it still relies on existing datasets that may be overly simplified. This is potentially reducing the effectiveness of the dataset for thorough evaluation and failing to accurately represent tables as they appear in the real-world. To enrich the existing benchmark datasets, we extract and annotate a new, more challenging dataset. The proposed Wiki-TabNER dataset features complex tables containing several entities per cell, with named entities labeled using DBpedia classes. This dataset is specifically designed to address named entity recognition (NER) task within tables, but it can also be used as a more challenging dataset for evaluating the entity linking task. In this paper we describe the distinguishing features of the Wiki-TabNER dataset and the labeling process. In addition, we propose a prompting framework for evaluating the new large language models on the within tables NER task. Finally, we perform qualitative analysis to gain insights into the challenges encountered by the models and to understand the limitations of the proposed~dataset.

Wiki-TabNER: Integrating Named Entity Recognition into Wikipedia Tables

TL;DR

This work introduces Wiki-TabNER, a benchmark dataset that brings real-world Wikipedia tables into NER evaluation by preserving multi-entity cells and annotating entities with DBpedia semantic classes and Wikidata IDs. It details dataset construction from the WikiTables corpus, including table filtering, entity extraction, semantic annotation, and span-based labeling embedded in the table structure. The paper also proposes a prompting-based evaluation framework for large language models to perform within-table NER, accompanied by qualitative analysis and ablation studies that reveal challenges in type prediction and label granularity. Findings show that while in-context learning improves NER within tables, the task remains difficult for current LLMs, underscoring the need for robust table NER benchmarks; the dataset is released to foster further research, with future work including multi-label classification and EL integration.

Abstract

Interest in solving table interpretation tasks has grown over the years, yet it still relies on existing datasets that may be overly simplified. This is potentially reducing the effectiveness of the dataset for thorough evaluation and failing to accurately represent tables as they appear in the real-world. To enrich the existing benchmark datasets, we extract and annotate a new, more challenging dataset. The proposed Wiki-TabNER dataset features complex tables containing several entities per cell, with named entities labeled using DBpedia classes. This dataset is specifically designed to address named entity recognition (NER) task within tables, but it can also be used as a more challenging dataset for evaluating the entity linking task. In this paper we describe the distinguishing features of the Wiki-TabNER dataset and the labeling process. In addition, we propose a prompting framework for evaluating the new large language models on the within tables NER task. Finally, we perform qualitative analysis to gain insights into the challenges encountered by the models and to understand the limitations of the proposed~dataset.
Paper Structure (30 sections, 1 equation, 7 figures, 4 tables)

This paper contains 30 sections, 1 equation, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Complex table retrieved from Wikipedia (left). Representation of the original table in the Wiki-TabNER dataset (center) and in the TURL dataset (right). The Wiki-TabNER table contains named entities of type: Work, Person, and Organization.
  • Figure 2: Number of entities per type in Wiki-TabNER.
  • Figure 3: Part of the first level of the DBpedia class tree. We highlight the classes with which we annotate the entities in the Wiki-TabNER dataset.
  • Figure 4: Example of a prompt with one-shot example. The instructions part is in red. One example table and its named entities are in blue. The input table for annotation is in violet.
  • Figure 5: Results from initial experiments with 2000 evaluation tables. The plot in Figure (a) shows the performance of the GPT-instruct model and we see that there is no change in the metrics after the 600 table. The Table (b) shows the significant amount of duration of these experiments.
  • ...and 2 more figures