ThinkEval: Practical Evaluation of Knowledge Leakage in LLM Editing using Thought-based Knowledge Graphs
Manit Baser, Dinil Mon Divakaran, Mohan Gurusamy
TL;DR
ThinkEval addresses indirect knowledge leakage in LLM editing by building CoT-derived knowledge graphs to analyze how edits propagate through causal chains. It introduces deep editing and the IFR metric, along with the KnowGIC benchmark of 1,406 multi-step chains, to systematically evaluate editing techniques across multiple models. The study finds that state-of-the-art methods balance direct edit efficacy with significant leakage and ripple effects, underscoring the need for holistic editing approaches and sequential testing. By providing a scalable framework and benchmark, ThinkEval offers practical guidance for safer, more reliable model editing in high-stakes domains, with potential extensions to non-factual and procedural knowledge.
Abstract
Robust model-editing techniques are essential for deploying large language models (LLMs) in practical applications, as they enable cost-effective ways to deal with challenges such as privacy breaches, bias mitigation and misinformation spread. For example, an LLM-based healthcare assistance may need to update out-dated or incorrect knowledge to prevent harmful recommendations. However, many editing techniques focus on isolated facts, which critically fail to prevent indirect knowledge leakage -- the unintended reconstruction of edited-out information through persistent causal links and contextual relationships. To assist users in selecting the right editing technique, we develop and present ThinkEval, a framework to systematically quantify indirect knowledge leakage and ripple effects in model-editing. ThinkEval builds and employs specialized knowledge graphs to analyze the causal structure of facts before and after editing. To support this approach, we present KnowGIC, a benchmark dataset comprising multi-step reasoning paths that precisely measure these complex knowledge transformation effects. We evaluate five editing techniques: AlphaEdit, RECT, ROME, MEMIT, and PRUNE across multiple LLMs. Our results show that these techniques struggle to balance indirect fact suppression with the preservation of related knowledge, compromising the contextual integrity of a model's knowledge. Our dataset is available at: https://github.com/manitbaser/KnowGIC.
