Table of Contents
Fetching ...

Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness

Rongzhe Wei, Peizhi Niu, Hans Hao-Hsun Hsu, Ruihan Wu, Haoteng Yin, Mohsen Ghassemi, Yifan Li, Vamsi K. Potluru, Eli Chien, Kamalika Chaudhuri, Olgica Milenkovic, Pan Li

TL;DR

This work tackles the shortcoming of surface-level unlearning by modeling LLM knowledge as a confidence-aware knowledge graph $\mathcal{G}= (\mathcal{E}, \mathcal{R}, \mathcal{T}_{\mathcal{U}})$ with quadruples $t=(s,r,o,u)$ and defining unlearning via an intrinsic judge $f(\mathcal{G}, e)$. It introduces a two-stage evaluation protocol that (i) extracts a correlated supporting subgraph $G_e^{\mathcal{A}}$ anchored in a real-world reference $\mathcal{G}_{\text{ref}}$ and (ii) uses a carefully calibrated LLM judge to assess residual inferability of a target triple $e$, complementing human validation. Experiments on YAGO3-10-based targets across two open-source LLMs show that correlated knowledge can substantially sustain inferences even after direct deletion, revealing that many unlearning methods overestimate forgetting when interdependencies are ignored. The paper contributes a practical evaluation framework and benchmark for knowledge unlearning, highlighting the need to account for correlation structure and confidence in future unlearning research and deployment.

Abstract

Machine unlearning techniques aim to mitigate unintended memorization in large language models (LLMs). However, existing approaches predominantly focus on the explicit removal of isolated facts, often overlooking latent inferential dependencies and the non-deterministic nature of knowledge within LLMs. Consequently, facts presumed forgotten may persist implicitly through correlated information. To address these challenges, we propose a knowledge unlearning evaluation framework that more accurately captures the implicit structure of real-world knowledge by representing relevant factual contexts as knowledge graphs with associated confidence scores. We further develop an inference-based evaluation protocol leveraging powerful LLMs as judges; these judges reason over the extracted knowledge subgraph to determine unlearning success. Our LLM judges utilize carefully designed prompts and are calibrated against human evaluations to ensure their trustworthiness and stability. Extensive experiments on our newly constructed benchmark demonstrate that our framework provides a more realistic and rigorous assessment of unlearning performance. Moreover, our findings reveal that current evaluation strategies tend to overestimate unlearning effectiveness. Our code is publicly available at https://github.com/Graph-COM/Knowledge_Unlearning.git.

Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness

TL;DR

This work tackles the shortcoming of surface-level unlearning by modeling LLM knowledge as a confidence-aware knowledge graph with quadruples and defining unlearning via an intrinsic judge . It introduces a two-stage evaluation protocol that (i) extracts a correlated supporting subgraph anchored in a real-world reference and (ii) uses a carefully calibrated LLM judge to assess residual inferability of a target triple , complementing human validation. Experiments on YAGO3-10-based targets across two open-source LLMs show that correlated knowledge can substantially sustain inferences even after direct deletion, revealing that many unlearning methods overestimate forgetting when interdependencies are ignored. The paper contributes a practical evaluation framework and benchmark for knowledge unlearning, highlighting the need to account for correlation structure and confidence in future unlearning research and deployment.

Abstract

Machine unlearning techniques aim to mitigate unintended memorization in large language models (LLMs). However, existing approaches predominantly focus on the explicit removal of isolated facts, often overlooking latent inferential dependencies and the non-deterministic nature of knowledge within LLMs. Consequently, facts presumed forgotten may persist implicitly through correlated information. To address these challenges, we propose a knowledge unlearning evaluation framework that more accurately captures the implicit structure of real-world knowledge by representing relevant factual contexts as knowledge graphs with associated confidence scores. We further develop an inference-based evaluation protocol leveraging powerful LLMs as judges; these judges reason over the extracted knowledge subgraph to determine unlearning success. Our LLM judges utilize carefully designed prompts and are calibrated against human evaluations to ensure their trustworthiness and stability. Extensive experiments on our newly constructed benchmark demonstrate that our framework provides a more realistic and rigorous assessment of unlearning performance. Moreover, our findings reveal that current evaluation strategies tend to overestimate unlearning effectiveness. Our code is publicly available at https://github.com/Graph-COM/Knowledge_Unlearning.git.

Paper Structure

This paper contains 34 sections, 30 figures, 3 tables, 1 algorithm.

Figures (30)

  • Figure 1: An Illustration of Knowledge Unlearning Framework.
  • Figure 2: Illustration of supporting subgraph.
  • Figure 3: The entire knowledge probing process.
  • Figure 4: Illustration of Instructions for LLM Judge.
  • Figure 5: Effectiveness of Entropy Threshold $u^*$ (Confidence) on UES.
  • ...and 25 more figures

Theorems & Definitions (1)

  • Definition 1: Knowledge Unlearning