Table of Contents
Fetching ...

Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning

Minseok Choi, ChaeHun Park, Dohyun Lee, Jaegul Choo

TL;DR

This work examines the limitations of current unlearning techniques in effectively erasing a particular type of indirect prompt: multi-hop queries and proposes MUNCH, a simple uncertainty-based approach that breaks down multi-hop queries into subquestions and leverages the uncertainty of the unlearned model in final decision-making.

Abstract

Large language models (LLMs) serve as giant information stores, often including personal or copyrighted data, and retraining them from scratch is not a viable option. This has led to the development of various fast, approximate unlearning techniques to selectively remove knowledge from LLMs. Prior research has largely focused on minimizing the probabilities of specific token sequences by reversing the language modeling objective. However, these methods still leave LLMs vulnerable to adversarial attacks that exploit indirect references. In this work, we examine the limitations of current unlearning techniques in effectively erasing a particular type of indirect prompt: multi-hop queries. Our findings reveal that existing methods fail to completely remove multi-hop knowledge when one of the intermediate hops is unlearned. To address this issue, we propose MUNCH, a simple uncertainty-based approach that breaks down multi-hop queries into subquestions and leverages the uncertainty of the unlearned model in final decision-making. Empirical results demonstrate the effectiveness of our framework, and MUNCH can be easily integrated with existing unlearning techniques, making it a flexible and useful solution for enhancing unlearning processes.

Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning

TL;DR

This work examines the limitations of current unlearning techniques in effectively erasing a particular type of indirect prompt: multi-hop queries and proposes MUNCH, a simple uncertainty-based approach that breaks down multi-hop queries into subquestions and leverages the uncertainty of the unlearned model in final decision-making.

Abstract

Large language models (LLMs) serve as giant information stores, often including personal or copyrighted data, and retraining them from scratch is not a viable option. This has led to the development of various fast, approximate unlearning techniques to selectively remove knowledge from LLMs. Prior research has largely focused on minimizing the probabilities of specific token sequences by reversing the language modeling objective. However, these methods still leave LLMs vulnerable to adversarial attacks that exploit indirect references. In this work, we examine the limitations of current unlearning techniques in effectively erasing a particular type of indirect prompt: multi-hop queries. Our findings reveal that existing methods fail to completely remove multi-hop knowledge when one of the intermediate hops is unlearned. To address this issue, we propose MUNCH, a simple uncertainty-based approach that breaks down multi-hop queries into subquestions and leverages the uncertainty of the unlearned model in final decision-making. Empirical results demonstrate the effectiveness of our framework, and MUNCH can be easily integrated with existing unlearning techniques, making it a flexible and useful solution for enhancing unlearning processes.

Paper Structure

This paper contains 31 sections, 5 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: Motivation for multi-hop knowledge unlearning. After Elon Musk (i.e., "the user") requests his personal information to be removed from the LLM, existing unlearning methods often succeed in deleting direct, single-hop facts but fail on indirect, multi-hop facts that entail one or a few of the unlearned facts.
  • Figure 2: Scaling performance of various unlearning methods using Llama-3.1-8B-Instruct across different proportions of data for the forget set (1%, 5%, and 10%). Models consistently preserve the ability to unlearn and retain single-hop facts with scaling. While unlearning multi-hop facts seems to improve with scaling, as evidenced by the performance drop, a similar decline is also observed in the retain set. This suggests that the effect may be attributed to catastrophic forgetting of broader information rather than a genuine improvement in unlearning multi-hop facts.
  • Figure 3: Performance of the GA+RT method with varying the loss scaling factor $\alpha$. Llama appears to be more sensitive than Phi to the value of $\alpha$ when balancing unlearning and retaining.
  • Figure 4: Overview of the proposed Munch framework.Munch begins by breaking down a multi-hop question into a sequence of subquestions, where each subquestion is passed to the original model to generate provisional answers. Then, Munch leverages the unlearned model to assess the uncertainty of each predicted answer by calculating uncertainty scores. If any subquestion yields a high uncertainty score -- exceeding a predefined threshold -- Munch responds with a rejection (e.g., "I don't know"). Otherwise, the final response is based on the last intermediate answer in the sequence.
  • Figure 5: Prompt used in Munch to decompose a multi-hop question into a series of subquestions using GPT-4o. It consists of a system prompt followed by three fixed demonstration examples.