Table of Contents
Fetching ...

Mitigating Heterogeneous Token Overfitting in LLM Knowledge Editing

Tianci Liu, Ruirui Li, Zihan Dong, Hui Liu, Xianfeng Tang, Qingyu Yin, Linjun Zhang, Haoyu Wang, Jing Gao

TL;DR

This work identifies heterogeneous token overfitting (HTO) as a key problem in knowledge editing, where different tokens in edited knowledge overfit at different rates and erode reasoning. It introduces OverTone, a token-level smoothing objective that adaptively refines target distributions for each token and uses a clipped forward KL to guide updates, achieving negligible overhead and a close relation to direct preference optimization without requiring preference data. The method is shown to be broadly compatible with multiple KE approaches and to improve reliability, generality, portability, and locality across several LLMs and benchmarks, including continual editing scenarios. The findings offer a principled, model-agnostic mechanism to mitigate KE overfitting with potential applicability to related tasks like machine unlearning.

Abstract

Large language models (LLMs) have achieved remarkable performance on various natural language tasks. However, they are trained on static corpora and their knowledge can become outdated quickly in the fast-changing world. This motivates the development of knowledge editing (KE) to update specific knowledge in LLMs without changing unrelated others or compromising their pre-trained capabilities. Previous efforts sought to update a small amount of parameters of a LLM and proved effective for making selective updates. Nonetheless, the edited LLM often exhibits degraded ability to reason about the new knowledge. In this work, we identify a key issue: heterogeneous token overfitting (HTO), where the LLM overfits different tokens in the provided knowledge at varying rates. To tackle this, we propose OVERTONE, a token-level smoothing method that mitigates HTO by adaptively refining the target distribution. Theoretically, OVERTONE offers better parameter updates with negligible computation overhead. It also induces an implicit DPO but does not require preference data pairs. Extensive experiments across four editing methods, two LLMs, and diverse scenarios demonstrate the effectiveness and versatility of our method.

Mitigating Heterogeneous Token Overfitting in LLM Knowledge Editing

TL;DR

This work identifies heterogeneous token overfitting (HTO) as a key problem in knowledge editing, where different tokens in edited knowledge overfit at different rates and erode reasoning. It introduces OverTone, a token-level smoothing objective that adaptively refines target distributions for each token and uses a clipped forward KL to guide updates, achieving negligible overhead and a close relation to direct preference optimization without requiring preference data. The method is shown to be broadly compatible with multiple KE approaches and to improve reliability, generality, portability, and locality across several LLMs and benchmarks, including continual editing scenarios. The findings offer a principled, model-agnostic mechanism to mitigate KE overfitting with potential applicability to related tasks like machine unlearning.

Abstract

Large language models (LLMs) have achieved remarkable performance on various natural language tasks. However, they are trained on static corpora and their knowledge can become outdated quickly in the fast-changing world. This motivates the development of knowledge editing (KE) to update specific knowledge in LLMs without changing unrelated others or compromising their pre-trained capabilities. Previous efforts sought to update a small amount of parameters of a LLM and proved effective for making selective updates. Nonetheless, the edited LLM often exhibits degraded ability to reason about the new knowledge. In this work, we identify a key issue: heterogeneous token overfitting (HTO), where the LLM overfits different tokens in the provided knowledge at varying rates. To tackle this, we propose OVERTONE, a token-level smoothing method that mitigates HTO by adaptively refining the target distribution. Theoretically, OVERTONE offers better parameter updates with negligible computation overhead. It also induces an implicit DPO but does not require preference data pairs. Extensive experiments across four editing methods, two LLMs, and diverse scenarios demonstrate the effectiveness and versatility of our method.

Paper Structure

This paper contains 29 sections, 12 theorems, 45 equations, 3 figures, 5 tables, 1 algorithm.

Key Result

Proposition 3.1

OverTone loss generalizes CE loss and reduces to the latter when $\epsilon = 0, \lambda = 1$.

Figures (3)

  • Figure 1: Loss (average) change of ground truth answers to generality (rephrased, left) and portability (reasoning, right) questions.
  • Figure 2: Token-level initial loss and UD (negative indicates overfitted). Dashed lines mark the mean values.
  • Figure 3: Continual Editing performance under different sequence length $T$. Solid and transparent bars show performance with and without OverTone. Unfilled area marks the performance gap. ROME and MEMIT didn't use OverTone.

Theorems & Definitions (22)

  • Proposition 3.1
  • Proposition 3.2
  • Theorem 3.3: Informal
  • Theorem 3.4
  • Proposition 1.1
  • Proposition 1.2
  • Lemma 1.3
  • proof
  • proof
  • proof
  • ...and 12 more