Table of Contents
Fetching ...

LLMAtKGE: Large Language Models as Explainable Attackers against Knowledge Graph Embeddings

Ting Li, Yang Yang, Yipeng Yu, Liang Yao, Guoqing Chao, Ruifeng Xu

TL;DR

KGEs are vulnerable to targeted perturbations, and existing attacks often lack interpretable explanations. LLMAtKGE combines structured prompting, semantics- and centrality-based candidate filtering, and a HoA-driven adapter to enable an LLM to select attack targets and generate human-readable justifications, while preserving and integrating KG context via parameter preservation and updating. Empirical results on WN18RR and FB15k-237 across DistMult, ComplEx, ConvE, and TransE show strong performance against black-box baselines and competitive results with white-box attacks, with ablations validating each component. The framework thus provides a practical, interpretable approach to adversarial KGEs with potential implications for robustness evaluation and defense, and code is released for reproducibility.

Abstract

Adversarial attacks on knowledge graph embeddings (KGE) aim to disrupt the model's ability of link prediction by removing or inserting triples. A recent black-box method has attempted to incorporate textual and structural information to enhance attack performance. However, it is unable to generate human-readable explanations, and exhibits poor generalizability. In the past few years, large language models (LLMs) have demonstrated powerful capabilities in text comprehension, generation, and reasoning. In this paper, we propose LLMAtKGE, a novel LLM-based framework that selects attack targets and generates human-readable explanations. To provide the LLM with sufficient factual context under limited input constraints, we design a structured prompting scheme that explicitly formulates the attack as multiple-choice questions while incorporating KG factual evidence. To address the context-window limitation and hesitation issues, we introduce semantics-based and centrality-based filters, which compress the candidate set while preserving high recall of attack-relevant information. Furthermore, to efficiently integrate both semantic and structural information into the filter, we precompute high-order adjacency and fine-tune the LLM with a triple classification task to enhance filtering performance. Experiments on two widely used knowledge graph datasets demonstrate that our attack outperforms the strongest black-box baselines and provides explanations via reasoning, and showing competitive performance compared with white-box methods. Comprehensive ablation and case studies further validate its capability to generate explanations.

LLMAtKGE: Large Language Models as Explainable Attackers against Knowledge Graph Embeddings

TL;DR

KGEs are vulnerable to targeted perturbations, and existing attacks often lack interpretable explanations. LLMAtKGE combines structured prompting, semantics- and centrality-based candidate filtering, and a HoA-driven adapter to enable an LLM to select attack targets and generate human-readable justifications, while preserving and integrating KG context via parameter preservation and updating. Empirical results on WN18RR and FB15k-237 across DistMult, ComplEx, ConvE, and TransE show strong performance against black-box baselines and competitive results with white-box attacks, with ablations validating each component. The framework thus provides a practical, interpretable approach to adversarial KGEs with potential implications for robustness evaluation and defense, and code is released for reproducibility.

Abstract

Adversarial attacks on knowledge graph embeddings (KGE) aim to disrupt the model's ability of link prediction by removing or inserting triples. A recent black-box method has attempted to incorporate textual and structural information to enhance attack performance. However, it is unable to generate human-readable explanations, and exhibits poor generalizability. In the past few years, large language models (LLMs) have demonstrated powerful capabilities in text comprehension, generation, and reasoning. In this paper, we propose LLMAtKGE, a novel LLM-based framework that selects attack targets and generates human-readable explanations. To provide the LLM with sufficient factual context under limited input constraints, we design a structured prompting scheme that explicitly formulates the attack as multiple-choice questions while incorporating KG factual evidence. To address the context-window limitation and hesitation issues, we introduce semantics-based and centrality-based filters, which compress the candidate set while preserving high recall of attack-relevant information. Furthermore, to efficiently integrate both semantic and structural information into the filter, we precompute high-order adjacency and fine-tune the LLM with a triple classification task to enhance filtering performance. Experiments on two widely used knowledge graph datasets demonstrate that our attack outperforms the strongest black-box baselines and provides explanations via reasoning, and showing competitive performance compared with white-box methods. Comprehensive ablation and case studies further validate its capability to generate explanations.

Paper Structure

This paper contains 35 sections, 7 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Comparison of current methods and our proposed LLMAtKGE. (a) Influence-based attacks typically rely on 1-hop triples as input and lack the ability to generate explanations. (b) Rule-based attacks exploit rules extracted from the KG to perform attacks, and derive explanations from selected rules. (c) An LLM-based attempt, even though providing only 1-hop triples may still lead to overflow, and hesitation in the chain of thought. (d) Our LLMAtKGE guides the LLM's reasoning to derive the answer while generating the human-readable explanations.
  • Figure 2: (a) The overview of LLMAtKGE. First, (b) the knowledge graph and (c) the triple graph are constructed to determine the initial candidates. The (d) semantics-based and (e) centrality-based filters are applied to shrink the candidate set. Then, following advanced prompt engineering techniques, the instruction $I$, example $E$, candidates $C$, and reference $R$ are concatenated as the input to the instruction LLM. The LLM reasons to generate answers and human-readable explanations, after which (g) deletion and (h) addition attacks are performed by removing triples or inserting new triples through entity replacement. (f) HoA efficiently tunes the LLM on triple classification, integrating multi-hop paths into the LLM to enhance the filtering performance.
  • Figure 3: Ablation results of HoA. Values closer to the center denote better attack performance.
  • Figure 4: Ablation study on centrality-based filter. PR, BC, and CC denote pagerank, betweenness, and closeness centralities, respectively.