KGPA: Robustness Evaluation for Large Language Models via Cross-Domain Knowledge Graphs
Aihua Pei, Zehua Yang, Shunan Zhu, Ruoxi Cheng, Ju Jia, Lina Wang
TL;DR
KGPA introduces a knowledge-graph-based framework to evaluate the adversarial robustness of large language models across domain-diverse knowledge graphs, eliminating reliance on manually annotated benchmarks. It comprises modules that convert KG triplets to original prompts (T2P), generate adversarial prompts (KGB-FSA and APGP), and refine prompt quality with a Prompt Refinement Engine (PRE) using LLMScore, guided by a tau_llm threshold. Robustness is quantified via NRA, RRA, and ASR across general and specialized knowledge graphs and models such as GPT-3.5-turbo, GPT-4-turbo, and GPT-4o, revealing that robustness ranks as GPT-4-turbo > GPT-4o > GPT-3.5-turbo and that domain knowledge influences performance. The approach demonstrates lower resource costs relative to benchmark-heavy frameworks while delivering actionable insights into cross-domain robustness and the effectiveness of different prompt-generation strategies.
Abstract
Existing frameworks for assessing robustness of large language models (LLMs) overly depend on specific benchmarks, increasing costs and failing to evaluate performance of LLMs in professional domains due to dataset limitations. This paper proposes a framework that systematically evaluates the robustness of LLMs under adversarial attack scenarios by leveraging knowledge graphs (KGs). Our framework generates original prompts from the triplets of knowledge graphs and creates adversarial prompts by poisoning, assessing the robustness of LLMs through the results of these adversarial attacks. We systematically evaluate the effectiveness of this framework and its modules. Experiments show that adversarial robustness of the ChatGPT family ranks as GPT-4-turbo > GPT-4o > GPT-3.5-turbo, and the robustness of large language models is influenced by the professional domains in which they operate.
