Table of Contents
Fetching ...

Uncertainty Management in the Construction of Knowledge Graphs: a Survey

Lucas Jarnac, Yoan Chabot, Miguel Couceiro

TL;DR

This survey tackles uncertainty management in knowledge graph construction by analyzing how unreliable or conflicting data from heterogeneous sources propagate through extraction, alignment, and fusion. It articulates a taxonomy of knowledge deltas (e.g., granularity vs contradictions) and proposes an ideal pipeline with alignment, fusion, and consistency checks augmented by provenance metadata. The paper reviews both open and enterprise KGs, various knowledge extraction methods (text, web, and probing with LLMs), and the state-of-the-art in uncertain KG embeddings, knowledge alignment, and fusion approaches, including numerous datasets used for evaluation. It concludes with perspectives on integrating uncertainty at both ontology and data-model levels, calling for holistic, provenance-aware solutions to improve KG quality and reliability in real-world applications.

Abstract

Knowledge Graphs (KGs) are a major asset for companies thanks to their great flexibility in data representation and their numerous applications, e.g., vocabulary sharing, Q/A or recommendation systems. To build a KG it is a common practice to rely on automatic methods for extracting knowledge from various heterogeneous sources. But in a noisy and uncertain world, knowledge may not be reliable and conflicts between data sources may occur. Integrating unreliable data would directly impact the use of the KG, therefore such conflicts must be resolved. This could be done manually by selecting the best data to integrate. This first approach is highly accurate, but costly and time-consuming. That is why recent efforts focus on automatic approaches, which represents a challenging task since it requires handling the uncertainty of extracted knowledge throughout its integration into the KG. We survey state-of-the-art approaches in this direction and present constructions of both open and enterprise KGs and how their quality is maintained. We then describe different knowledge extraction methods, introducing additional uncertainty. We also discuss downstream tasks after knowledge acquisition, including KG completion using embedding models, knowledge alignment, and knowledge fusion in order to address the problem of knowledge uncertainty in KG construction. We conclude with a discussion on the remaining challenges and perspectives when constructing a KG taking into account uncertainty.

Uncertainty Management in the Construction of Knowledge Graphs: a Survey

TL;DR

This survey tackles uncertainty management in knowledge graph construction by analyzing how unreliable or conflicting data from heterogeneous sources propagate through extraction, alignment, and fusion. It articulates a taxonomy of knowledge deltas (e.g., granularity vs contradictions) and proposes an ideal pipeline with alignment, fusion, and consistency checks augmented by provenance metadata. The paper reviews both open and enterprise KGs, various knowledge extraction methods (text, web, and probing with LLMs), and the state-of-the-art in uncertain KG embeddings, knowledge alignment, and fusion approaches, including numerous datasets used for evaluation. It concludes with perspectives on integrating uncertainty at both ontology and data-model levels, calling for holistic, provenance-aware solutions to improve KG quality and reliability in real-world applications.

Abstract

Knowledge Graphs (KGs) are a major asset for companies thanks to their great flexibility in data representation and their numerous applications, e.g., vocabulary sharing, Q/A or recommendation systems. To build a KG it is a common practice to rely on automatic methods for extracting knowledge from various heterogeneous sources. But in a noisy and uncertain world, knowledge may not be reliable and conflicts between data sources may occur. Integrating unreliable data would directly impact the use of the KG, therefore such conflicts must be resolved. This could be done manually by selecting the best data to integrate. This first approach is highly accurate, but costly and time-consuming. That is why recent efforts focus on automatic approaches, which represents a challenging task since it requires handling the uncertainty of extracted knowledge throughout its integration into the KG. We survey state-of-the-art approaches in this direction and present constructions of both open and enterprise KGs and how their quality is maintained. We then describe different knowledge extraction methods, introducing additional uncertainty. We also discuss downstream tasks after knowledge acquisition, including KG completion using embedding models, knowledge alignment, and knowledge fusion in order to address the problem of knowledge uncertainty in KG construction. We conclude with a discussion on the remaining challenges and perspectives when constructing a KG taking into account uncertainty.
Paper Structure (27 sections, 10 equations, 13 figures, 2 tables, 1 algorithm)

This paper contains 27 sections, 10 equations, 13 figures, 2 tables, 1 algorithm.

Figures (13)

  • Figure 1: Illustration of a contradiction between two sources. One of the sources claims that the mandate of Jacques Chirac as mayor of Paris began on March 25 while the other claims that it began on March 20.
  • Figure 2: Distribution of paper publication years according to uncertain KG embedding (), knowledge fusion (), knowledge alignment (), uncertainty representation ().
  • Figure 3: TBox stands for terminology box that contains classes and properties; ABox stands for assertion box that contains instances (e.g., Galaxy S23) and values (e.g., "2023").
  • Figure 4: Illustration of knowledge extraction from a single sentence. LOD stands for Linked Open Data.
  • Figure 5: Illustration of the different possible deltas about some topics between English Wikipedia and Wikidata: (a) invalidity + vagueness, (b) fuzziness, (c) timeliness, (d) ambiguity, and (e) incompleteness.
  • ...and 8 more figures