KIF: A Wikidata-Based Framework for Integrating Heterogeneous Knowledge Sources
Guilherme Lima, João M. B. Rodrigues, Marcelo Machado, Elton Soares, Sandro R. Fiorini, Raphael Thiago, Leonardo G. Azevedo, Viviane T. da Silva, Renato Cerqueira
TL;DR
KIF presents a Wikidata-based framework for virtually integrating heterogeneous knowledge sources by using a store abstraction to provide Wikidata-like views of diverse data sources. It introduces a pattern-driven query interface and a mixer store to federate multiple sources, while preserving provenance through annotations. The framework includes a SPARQL store, a PubChem mapping, and additional store types (RDF, CSV), enabling integration with sources such as Wikidata, PubChem, and IBM CIRCA via Ontop. An application in chemistry demonstrates the approach, with an evaluation showing KIF's overhead is negligible compared to endpoint processing. The work highlights the practicality of Wikidata as a universal integration model and outlines future improvements like parallel querying, mutable stores, and formal semantics.
Abstract
We present a Wikidata-based framework, called KIF, for virtually integrating heterogeneous knowledge sources. KIF is written in Python and is released as open-source. It leverages Wikidata's data model and vocabulary plus user-defined mappings to construct a unified view of the underlying sources while keeping track of the context and provenance of their statements. The underlying sources can be triplestores, relational databases, CSV files, etc., which may or may not use the vocabulary and RDF encoding of Wikidata. The end result is a virtual knowledge base which behaves like an "extended Wikidata" and which can be queried using a simple but expressive pattern language, defined in terms of Wikidata's data model. In this paper, we present the design and implementation of KIF, discuss how we have used it to solve a real integration problem in the domain of chemistry (involving Wikidata, PubChem, and IBM CIRCA), and present experimental results on the performance and overhead of KIF
