YAGO 4.5: A Large and Clean Knowledge Base with a Rich Taxonomy
Fabian Suchanek, Mehwish Alam, Thomas Bonald, Lihu Chen, Pierre-Henri Paris, Jules Soria
TL;DR
This work presents YAGO 4.5, a large, logically consistent knowledge base that extends YAGO 4 by reintegrating a substantial portion of the Wikidata taxonomy atop the Schema.org upper taxonomy. It articulates design principles to balance a clean upper taxonomy with a richly informative lower taxonomy, and it implements a scalable, Python-based pipeline to merge, constrain, and reason over data from Wikidata and Schema.org while avoiding cycles and redundancies. Intrinsic evaluation demonstrates a tighter, more coherent taxonomy, and extrinsic evaluation shows improved entity disambiguation, particularly for mentions with many candidates, illustrating practical benefits for information retrieval and semantic search. The resource is publicly available with SPARQL access and downloadable data, enabling researchers and practitioners to leverage a richer, more navigable taxonomy for knowledge-driven tasks, while noting open challenges around maintenance and further expansion of domain-specific classes.
Abstract
Knowledge Bases (KBs) find applications in many knowledge-intensive tasks and, most notably, in information retrieval. Wikidata is one of the largest public general-purpose KBs. Yet, its collaborative nature has led to a convoluted schema and taxonomy. The YAGO 4 KB cleaned up the taxonomy by incorporating the ontology of Schema.org, resulting in a cleaner structure amenable to automated reasoning. However, it also cut away large parts of the Wikidata taxonomy, which is essential for information retrieval. In this paper, we extend YAGO 4 with a large part of the Wikidata taxonomy - while respecting logical constraints and the distinction between classes and instances. This yields YAGO 4.5, a new, logically consistent version of YAGO that adds a rich layer of informative classes. An intrinsic and an extrinsic evaluation show the value of the new resource.
