Table of Contents
Fetching ...

From Latent to Lucid: Transforming Knowledge Graph Embeddings into Interpretable Structures with KGEPrisma

Christoph Wehner, Chrysa Iliopoulou, Ute Schmid, Tarek R. Besold

TL;DR

Knowledge Graph Embeddings (KGEs) power link prediction but suffer from opaque decision processes. KGEPrisma provides post-hoc, local explanations by decoding latent embeddings into symbolic clauses drawn from the subgraph neighborhoods of similar embeddings, using a five-step workflow that includes kNN search, positive/negative pair construction, clause mining, surrogate-model-based ranking, and grounding into rule-, instance-, and analogy-based explanations. The method yields faithful explanations without retraining, scales to large graphs, and demonstrates state-of-the-art faithfulness across multiple benchmarks (FB15k-237, WN18RR, Kinship) while delivering competitive runtimes. This approach enables transparent, human-understandable insights into KGE predictions and is adaptable to diverse user needs and domains, including potential biomedical applications.

Abstract

In this paper, we introduce a post-hoc and local explainable AI method tailored for Knowledge Graph Embedding (KGE) models. These models are essential to Knowledge Graph Completion yet criticized for their opaque, black-box nature. Despite their significant success in capturing the semantics of knowledge graphs through high-dimensional latent representations, their inherent complexity poses substantial challenges to explainability. While existing methods like Kelpie use resource-intensive perturbation to explain KGE models, our approach directly decodes the latent representations encoded by KGE models, leveraging the smoothness of the embeddings, which follows the principle that similar embeddings reflect similar behaviours within the Knowledge Graph, meaning that nodes are similarly embedded because their graph neighbourhood looks similar. This principle is commonly referred to as smoothness. By identifying symbolic structures, in the form of triples, within the subgraph neighborhoods of similarly embedded entities, our method identifies the statistical regularities on which the models rely and translates these insights into human-understandable symbolic rules and facts. This bridges the gap between the abstract representations of KGE models and their predictive outputs, offering clear, interpretable insights. Key contributions include a novel post-hoc and local explainable AI method for KGE models that provides immediate, faithful explanations without retraining, facilitating real-time application on large-scale knowledge graphs. The method's flexibility enables the generation of rule-based, instance-based, and analogy-based explanations, meeting diverse user needs. Extensive evaluations show the effectiveness of our approach in delivering faithful and well-localized explanations, enhancing the transparency and trustworthiness of KGE models.

From Latent to Lucid: Transforming Knowledge Graph Embeddings into Interpretable Structures with KGEPrisma

TL;DR

Knowledge Graph Embeddings (KGEs) power link prediction but suffer from opaque decision processes. KGEPrisma provides post-hoc, local explanations by decoding latent embeddings into symbolic clauses drawn from the subgraph neighborhoods of similar embeddings, using a five-step workflow that includes kNN search, positive/negative pair construction, clause mining, surrogate-model-based ranking, and grounding into rule-, instance-, and analogy-based explanations. The method yields faithful explanations without retraining, scales to large graphs, and demonstrates state-of-the-art faithfulness across multiple benchmarks (FB15k-237, WN18RR, Kinship) while delivering competitive runtimes. This approach enables transparent, human-understandable insights into KGE predictions and is adaptable to diverse user needs and domains, including potential biomedical applications.

Abstract

In this paper, we introduce a post-hoc and local explainable AI method tailored for Knowledge Graph Embedding (KGE) models. These models are essential to Knowledge Graph Completion yet criticized for their opaque, black-box nature. Despite their significant success in capturing the semantics of knowledge graphs through high-dimensional latent representations, their inherent complexity poses substantial challenges to explainability. While existing methods like Kelpie use resource-intensive perturbation to explain KGE models, our approach directly decodes the latent representations encoded by KGE models, leveraging the smoothness of the embeddings, which follows the principle that similar embeddings reflect similar behaviours within the Knowledge Graph, meaning that nodes are similarly embedded because their graph neighbourhood looks similar. This principle is commonly referred to as smoothness. By identifying symbolic structures, in the form of triples, within the subgraph neighborhoods of similarly embedded entities, our method identifies the statistical regularities on which the models rely and translates these insights into human-understandable symbolic rules and facts. This bridges the gap between the abstract representations of KGE models and their predictive outputs, offering clear, interpretable insights. Key contributions include a novel post-hoc and local explainable AI method for KGE models that provides immediate, faithful explanations without retraining, facilitating real-time application on large-scale knowledge graphs. The method's flexibility enables the generation of rule-based, instance-based, and analogy-based explanations, meeting diverse user needs. Extensive evaluations show the effectiveness of our approach in delivering faithful and well-localized explanations, enhancing the transparency and trustworthiness of KGE models.
Paper Structure (24 sections, 21 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 24 sections, 21 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: KGEPrisma generates explanations for KGE models in the form of subgraphs, uncovering the reasoning of the KGE model and building trust in the model's prediction. For example, the KGE model predicts that Sars-Cov-2 enters the respiratory cell via the ACE2 receptor. The explanation subgraph by KGEPrisma uncover that Sars-Cov-2 has a Spike (S) protein. Furthermore, it shows that Sars-Cov-2 is related to Sars-Cov-1; thus, both are likely to behave similar. Sars-Cov-1 also has a Spike (S) protein, which Sars-Cov-1 uses to bind to the ACE2 receptor, enabling it to enter the respiratory cell.
  • Figure 2: KGEPrisma generates explanations of KGE models in five steps. The five steps are discussed in detail in Section \ref{['sec:method']}.
  • Figure 3: DistMult in FB15k-237 predicts that Jessica Eisenberg's net worth is measured in US dollars. KGEPrisma explains this by pointing to the movies Jessica Eisberg performed in and their budget currency, US dollars, which is a sensible explanation. AnyBURLExplainer points to the fact that Jessica Eisberg was nominated alongside an award with Bill Hader as an explanation for Jessica Eisenberg's net worth being measured in US dollars. This explanation is not sensible.
  • Figure 4: TransE in WN18RR predicts that the bacteria family serves as a hypernym for the Treponemataceae family. An instance-based explanation provided by KGEPrisma explains this relationship by indicating that the Treponemataceae family is a subset (or meronym) of the Spirochaetales order. This order belongs to the Eubacteria division, which also includes the the bacteria family. Thus, the instance-based explanation effectively demonstrates why the bacteria family can be considered a hypernym for the Treponemataceae family. Supporting the instance-based explanation, the analogy-based explanation provided by KGEPrisma points out to the user that the triple $(family\_Bacillaceae, hypernym, bacteria\_family)$ is similar to the predicted triple. In the context of this triple, we know that the Bacillaceae family is a subset of the Eubacteriales order, which in turn belongs to the Eubacteria division that includes the bacteria family. This analogous example reinsures the user that the reasoning pattern leading to the prediction aligns with other established facts within the KG.
  • Figure 5: Mean execution time (over 10 runs on the same hardware, 40 explanations per run) for AnyBURLExplainer, Kelpie, Data Poisoning, and KGEPrisma explaining TransE, DistMult, and ConvE in the Kinship KG.