Table of Contents
Fetching ...

Wiki Entity Summarization Benchmark

Saeedeh Javadi, Atefeh Moradan, Mohammad Sorkhpar, Klim Zaporojets, Davide Mottin, Ira Assent

TL;DR

This work tackles the need for scalable, graph-aware benchmarks for entity summarization in knowledge graphs. It introduces WikES, an automatic benchmark that derives summaries from Wikipedia abstracts linked to Wikidata and preserves graph topology through a random-walk based generator, enabling large-scale, connected datasets. The authors provide four seed-domain datasets across three sizes with train/validation/test splits and demonstrate reduced bias compared to prior benchmarks, while showing competitive performance of graph-aware baselines like LinkSum on small graphs. The resources—data, code, and toolkit—are released publicly to accelerate evaluation and development of robust ES methods on realistic, structure-rich knowledge graphs.

Abstract

Entity summarization aims to compute concise summaries for entities in knowledge graphs. Existing datasets and benchmarks are often limited to a few hundred entities and discard graph structure in source knowledge graphs. This limitation is particularly pronounced when it comes to ground-truth summaries, where there exist only a few labeled summaries for evaluation and training. We propose WikES, a comprehensive benchmark comprising of entities, their summaries, and their connections. Additionally, WikES features a dataset generator to test entity summarization algorithms in different areas of the knowledge graph. Importantly, our approach combines graph algorithms and NLP models as well as different data sources such that WikES does not require human annotation, rendering the approach cost-effective and generalizable to multiple domains. Finally, WikES is scalable and capable of capturing the complexities of knowledge graphs in terms of topology and semantics. WikES features existing datasets for comparison. Empirical studies of entity summarization methods confirm the usefulness of our benchmark. Data, code, and models are available at: https://github.com/msorkhpar/wiki-entity-summarization.

Wiki Entity Summarization Benchmark

TL;DR

This work tackles the need for scalable, graph-aware benchmarks for entity summarization in knowledge graphs. It introduces WikES, an automatic benchmark that derives summaries from Wikipedia abstracts linked to Wikidata and preserves graph topology through a random-walk based generator, enabling large-scale, connected datasets. The authors provide four seed-domain datasets across three sizes with train/validation/test splits and demonstrate reduced bias compared to prior benchmarks, while showing competitive performance of graph-aware baselines like LinkSum on small graphs. The resources—data, code, and toolkit—are released publicly to accelerate evaluation and development of robust ES methods on realistic, structure-rich knowledge graphs.

Abstract

Entity summarization aims to compute concise summaries for entities in knowledge graphs. Existing datasets and benchmarks are often limited to a few hundred entities and discard graph structure in source knowledge graphs. This limitation is particularly pronounced when it comes to ground-truth summaries, where there exist only a few labeled summaries for evaluation and training. We propose WikES, a comprehensive benchmark comprising of entities, their summaries, and their connections. Additionally, WikES features a dataset generator to test entity summarization algorithms in different areas of the knowledge graph. Importantly, our approach combines graph algorithms and NLP models as well as different data sources such that WikES does not require human annotation, rendering the approach cost-effective and generalizable to multiple domains. Finally, WikES is scalable and capable of capturing the complexities of knowledge graphs in terms of topology and semantics. WikES features existing datasets for comparison. Empirical studies of entity summarization methods confirm the usefulness of our benchmark. Data, code, and models are available at: https://github.com/msorkhpar/wiki-entity-summarization.
Paper Structure (21 sections, 6 equations, 12 figures, 11 tables, 1 algorithm)

This paper contains 21 sections, 6 equations, 12 figures, 11 tables, 1 algorithm.

Figures (12)

  • Figure 1: KG subgraph of entity Ellen Johnson Sirleaf: arrows depict the subgraph of relationships to other entities, and labels indicate their roles. Selecting the bold edges as entity summaries of the most relevant triples may reduce information overload while concisely describing the entity.
  • Figure 2: F1 score and MAP for frequency statistics on ESBM datasets.
  • Figure 3: F1 for frequency statistics on WikiProFem.
  • Figure 4: MAP for frequency statistics on WikiProFem.
  • Figure 5: F1 for frequency statistics on WikiLitArt.
  • ...and 7 more figures