Table of Contents
Fetching ...

REACT: Representation Extraction And Controllable Tuning to Overcome Overfitting in LLM Knowledge Editing

Haitian Zhong, Yuhuan Liu, Ziyang Xu, Guofan Liu, Qiang Liu, Shu Wu, Zhe Zhao, Liang Wang, Tieniu Tan

TL;DR

REACT addresses overfitting in LLM knowledge editing by decoupling edits into latent-representation extraction and controllable perturbations of hidden states. It extracts a compact belief-shift vector from stimuli using PCA and a learnable linear transform, then applies gated, magnitude-controlled perturbations to the Transformer hidden states based on a pre-trained classifier. The method achieves balanced improvements across reliability, locality, generality, and portability on COUNTERFACT and MQuAKE, and significantly reduces overfitting while preserving generalization on EVOKE. This approach enables precise, context-aware knowledge updates without heavy parameter retraining, offering a practical path toward robust knowledge editing in large language models.

Abstract

Large language model editing methods frequently suffer from overfitting, wherein factual updates can propagate beyond their intended scope, overemphasizing the edited target even when it's contextually inappropriate. To address this challenge, we introduce REACT (Representation Extraction And Controllable Tuning), a unified two-phase framework designed for precise and controllable knowledge editing. In the initial phase, we utilize tailored stimuli to extract latent factual representations and apply Principal Component Analysis with a simple learnbale linear transformation to compute a directional "belief shift" vector for each instance. In the second phase, we apply controllable perturbations to hidden states using the obtained vector with a magnitude scalar, gated by a pre-trained classifier that permits edits only when contextually necessary. Relevant experiments on EVOKE benchmarks demonstrate that REACT significantly reduces overfitting across nearly all evaluation metrics, and experiments on COUNTERFACT and MQuAKE shows that our method preserves balanced basic editing performance (reliability, locality, and generality) under diverse editing scenarios.

REACT: Representation Extraction And Controllable Tuning to Overcome Overfitting in LLM Knowledge Editing

TL;DR

REACT addresses overfitting in LLM knowledge editing by decoupling edits into latent-representation extraction and controllable perturbations of hidden states. It extracts a compact belief-shift vector from stimuli using PCA and a learnable linear transform, then applies gated, magnitude-controlled perturbations to the Transformer hidden states based on a pre-trained classifier. The method achieves balanced improvements across reliability, locality, generality, and portability on COUNTERFACT and MQuAKE, and significantly reduces overfitting while preserving generalization on EVOKE. This approach enables precise, context-aware knowledge updates without heavy parameter retraining, offering a practical path toward robust knowledge editing in large language models.

Abstract

Large language model editing methods frequently suffer from overfitting, wherein factual updates can propagate beyond their intended scope, overemphasizing the edited target even when it's contextually inappropriate. To address this challenge, we introduce REACT (Representation Extraction And Controllable Tuning), a unified two-phase framework designed for precise and controllable knowledge editing. In the initial phase, we utilize tailored stimuli to extract latent factual representations and apply Principal Component Analysis with a simple learnbale linear transformation to compute a directional "belief shift" vector for each instance. In the second phase, we apply controllable perturbations to hidden states using the obtained vector with a magnitude scalar, gated by a pre-trained classifier that permits edits only when contextually necessary. Relevant experiments on EVOKE benchmarks demonstrate that REACT significantly reduces overfitting across nearly all evaluation metrics, and experiments on COUNTERFACT and MQuAKE shows that our method preserves balanced basic editing performance (reliability, locality, and generality) under diverse editing scenarios.

Paper Structure

This paper contains 63 sections, 22 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Illustration of overfitting in LLM editing. Overfitting occurs when the model disproportionately emphasizes the edited target fact, even in contexts irrelevant to the edit. As shown on the right side, after editing the fact about Luka Doncic's team to "Lakers," the overfitted model incorrectly assigns high probability to "Lakers" even for a query about Doncic's teammates.
  • Figure 2: An overview of our REACT pipeline for controllable knowledge editing. We First construct stimuli prompts and feed them into the LLM to extract layer-wise representations, which are then processed via PCA and an MLP to isolate the key “belief shift” vector. Thereafter, we apply a controllable perturbation (using learned scalar factors) to the model’s hidden states. The pre-trained classifier manages when the edits should occur.
  • Figure 3: Editing results on COUNTERFACT and MQuAKE-CF-v2 in radar chart. Detailed results could be found in Appendix \ref{['app:detail_results']}.
  • Figure 4: Editing results on EVOKE in radar chart. Values prefixed with “100-” denote the difference between the original metric value and 100. Results beginning with “L:” correspond to the Llama 3.1 model, while “Q:” to the Qwen 2.5 model. Detailed results can be found in Appendix \ref{['app:detail_results']}.