Table of Contents
Fetching ...

KGExplainer: Towards Exploring Connected Subgraph Explanations for Knowledge Graph Completion

Tengfei Ma, Xiang song, Wen Tao, Mufei Li, Jiani Zhang, Xiaoqin Pan, Jianxin Lin, Bosheng Song, xiangxiang Zeng

TL;DR

This work tackles the opacity of knowledge graph embedding (KGE)–based knowledge graph completion (KGC) by proposing KGExplainer, a post-hoc framework that identifies connected subgraph explanations within the enclosing subgraph of a target prediction $\left<h,r,t\right>$ and evaluates them with a distilled subgraph evaluator. It combines a perturbation-based greedy search to extract key subgraphs with a relatonal graph neural network–based evaluator trained to mimic the target KGE, enabling quantitative fidelity assessment. Empirical results on WN-18, Family-rr, and FB15k-237 show KGExplainer achieves near-parallel predictive performance to the original KGE models while delivering significantly improved explainability metrics (Recall@1 and F1@1) and strong human preference (83.3%). The approach scales to large KGs via local subgraph retraining and bounded complexity, and demonstrates robustness across RotatE, TransE, and DistMult, highlighting its practical potential for accountable KGC and wider applicability to other domains.

Abstract

Knowledge graph completion (KGC) aims to alleviate the inherent incompleteness of knowledge graphs (KGs), which is a critical task for various applications, such as recommendations on the web. Although knowledge graph embedding (KGE) models have demonstrated superior predictive performance on KGC tasks, these models infer missing links in a black-box manner that lacks transparency and accountability, preventing researchers from developing accountable models. Existing KGE-based explanation methods focus on exploring key paths or isolated edges as explanations, which is information-less to reason target prediction. Additionally, the missing ground truth leads to these explanation methods being ineffective in quantitatively evaluating explored explanations. To overcome these limitations, we propose KGExplainer, a model-agnostic method that identifies connected subgraph explanations and distills an evaluator to assess them quantitatively. KGExplainer employs a perturbation-based greedy search algorithm to find key connected subgraphs as explanations within the local structure of target predictions. To evaluate the quality of the explored explanations, KGExplainer distills an evaluator from the target KGE model. By forwarding the explanations to the evaluator, our method can examine the fidelity of them. Extensive experiments on benchmark datasets demonstrate that KGExplainer yields promising improvement and achieves an optimal ratio of 83.3% in human evaluation.

KGExplainer: Towards Exploring Connected Subgraph Explanations for Knowledge Graph Completion

TL;DR

This work tackles the opacity of knowledge graph embedding (KGE)–based knowledge graph completion (KGC) by proposing KGExplainer, a post-hoc framework that identifies connected subgraph explanations within the enclosing subgraph of a target prediction and evaluates them with a distilled subgraph evaluator. It combines a perturbation-based greedy search to extract key subgraphs with a relatonal graph neural network–based evaluator trained to mimic the target KGE, enabling quantitative fidelity assessment. Empirical results on WN-18, Family-rr, and FB15k-237 show KGExplainer achieves near-parallel predictive performance to the original KGE models while delivering significantly improved explainability metrics (Recall@1 and F1@1) and strong human preference (83.3%). The approach scales to large KGs via local subgraph retraining and bounded complexity, and demonstrates robustness across RotatE, TransE, and DistMult, highlighting its practical potential for accountable KGC and wider applicability to other domains.

Abstract

Knowledge graph completion (KGC) aims to alleviate the inherent incompleteness of knowledge graphs (KGs), which is a critical task for various applications, such as recommendations on the web. Although knowledge graph embedding (KGE) models have demonstrated superior predictive performance on KGC tasks, these models infer missing links in a black-box manner that lacks transparency and accountability, preventing researchers from developing accountable models. Existing KGE-based explanation methods focus on exploring key paths or isolated edges as explanations, which is information-less to reason target prediction. Additionally, the missing ground truth leads to these explanation methods being ineffective in quantitatively evaluating explored explanations. To overcome these limitations, we propose KGExplainer, a model-agnostic method that identifies connected subgraph explanations and distills an evaluator to assess them quantitatively. KGExplainer employs a perturbation-based greedy search algorithm to find key connected subgraphs as explanations within the local structure of target predictions. To evaluate the quality of the explored explanations, KGExplainer distills an evaluator from the target KGE model. By forwarding the explanations to the evaluator, our method can examine the fidelity of them. Extensive experiments on benchmark datasets demonstrate that KGExplainer yields promising improvement and achieves an optimal ratio of 83.3% in human evaluation.
Paper Structure (47 sections, 2 theorems, 11 equations, 7 figures, 11 tables, 1 algorithm)

This paper contains 47 sections, 2 theorems, 11 equations, 7 figures, 11 tables, 1 algorithm.

Key Result

Proposition 1

Given $\mathcal{G}=(\mathcal{V},\mathcal{R},\mathcal{E})$, a predicted fact $\left<h,r,t\right>$, the maximum length of paths $L$, and the number of paths $N$, the size of $g$ is $O(\frac{LN|\mathcal{V}|}{|\mathcal{E}|})$ and the size of $\mathcal{N}_{extend}$ is $O(LN)$.

Figures (7)

  • Figure 1: Here is an example of subgraph- and path-based explanations. When considering the target prediction (dashed line) $\left<Alibaba,\mathbf{isCompetitor},JD.com\right>$, the path-based explanation $\left<Alibaba,\mathbf{investIn}, Retail, \mathbf{isInvestedBy}, JD.com\right>$ cannot always conclude the fact $\left<Alibaba,\mathbf{isCompetitor},JD.com\right>$ due to the location of them is confusing. In contrast, the subgraph-based explanation introduces an additional fact: $Alibaba$ and $JD.com$ are both located in $China$. The fact allows for an accurate deduction of the target prediction.
  • Figure 2: The KGExplainer framework comprises three modules: (1) Pre-training KGE models on the target KG ; (2) Exploring subgraph-based explanations for pre-trained KGE models by greedy search ; (3) Distilling a subgraph structure evaluator from pre-trained KGE to assess the explored explanations quantitatively.
  • Figure 3: The details of exploring explanations. KGExplainer searches top $n=2$ key entities per hop from the source to target entities greedy within the enclosing subgraph, and then extracts the key subgraph by the identified entities.
  • Figure 4: The explainable performance of KGExplainer with different hyper-parameters over Family-rr dataset.
  • Figure 5: The explanations explored by Kelpie, DRUM, PaGE-Link, and KGExplainer on the fact $\left<id:2, isFather, id:5\right>$.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Proposition 2