Table of Contents
Fetching ...

An Information-Theoretic Framework for Robust Large Language Model Editing

Qizhou Chen, Chengyu Wang, Taolin Zhang, Xiaofeng He

TL;DR

This work tackles the challenge of updating large language models with targeted edits without full retraining. By framing model editing through an information bottleneck lens, the authors derive ITM, SG, and IL constraints and implement them in the Information Bottleneck Knowledge Editor (IBKE). IBKE uses gradient-based latent encoding via hypernetworks to produce compact representations that guide selective, generalizable updates, achieving superior generality while preserving locality across multiple backbones and benchmarks. The approach is validated on four editing datasets and four LLM architectures, demonstrating robust, open-domain knowledge editing with principled trade-offs and avenues for future improvements such as low-rank updates and lifelong editing.

Abstract

Large Language Models (LLMs) have become indispensable tools in science, technology, and society, enabling transformative advances across diverse fields. However, errors or outdated information within these models can undermine their accuracy and restrict their safe deployment. Developing efficient strategies for updating model knowledge without the expense and disruption of full retraining remains a critical challenge. Current model editing techniques frequently struggle to generalize corrections beyond narrow domains, leading to unintended consequences and limiting their practical impact. Here, we introduce a novel framework for editing LLMs, grounded in information bottleneck theory. This approach precisely compresses and isolates the essential information required for generalizable knowledge correction while minimizing disruption to unrelated model behaviors. Building upon this foundation, we present the Information Bottleneck Knowledge Editor (IBKE), which leverages compact latent representations to guide gradient-based updates, enabling robust and broadly applicable model editing. We validate IBKE's effectiveness across multiple LLM architectures and standard benchmark tasks, demonstrating state-of-the-art accuracy and improved generality and specificity of edits. These findings establish a theoretically principled and practical paradigm for open-domain knowledge editing, advancing the utility and trustworthiness of LLMs in real-world applications.

An Information-Theoretic Framework for Robust Large Language Model Editing

TL;DR

This work tackles the challenge of updating large language models with targeted edits without full retraining. By framing model editing through an information bottleneck lens, the authors derive ITM, SG, and IL constraints and implement them in the Information Bottleneck Knowledge Editor (IBKE). IBKE uses gradient-based latent encoding via hypernetworks to produce compact representations that guide selective, generalizable updates, achieving superior generality while preserving locality across multiple backbones and benchmarks. The approach is validated on four editing datasets and four LLM architectures, demonstrating robust, open-domain knowledge editing with principled trade-offs and avenues for future improvements such as low-rank updates and lifelong editing.

Abstract

Large Language Models (LLMs) have become indispensable tools in science, technology, and society, enabling transformative advances across diverse fields. However, errors or outdated information within these models can undermine their accuracy and restrict their safe deployment. Developing efficient strategies for updating model knowledge without the expense and disruption of full retraining remains a critical challenge. Current model editing techniques frequently struggle to generalize corrections beyond narrow domains, leading to unintended consequences and limiting their practical impact. Here, we introduce a novel framework for editing LLMs, grounded in information bottleneck theory. This approach precisely compresses and isolates the essential information required for generalizable knowledge correction while minimizing disruption to unrelated model behaviors. Building upon this foundation, we present the Information Bottleneck Knowledge Editor (IBKE), which leverages compact latent representations to guide gradient-based updates, enabling robust and broadly applicable model editing. We validate IBKE's effectiveness across multiple LLM architectures and standard benchmark tasks, demonstrating state-of-the-art accuracy and improved generality and specificity of edits. These findings establish a theoretically principled and practical paradigm for open-domain knowledge editing, advancing the utility and trustworthiness of LLMs in real-world applications.

Paper Structure

This paper contains 18 sections, 25 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Comparison of two model editing paradigms with IBKE. Direct Editing (DE) is based solely on the edit sample as its prior, while Edit Training (ET) leverages richer priors; when combined with the IB, ET achieves improved generalization. The bar chart presents the top three prediction probabilities of the model after editing (using "E1", "E2", and "E3") for each corresponding sample. Results for the three editing paradigms are shown as follows: ROME (for DE), IBKE without IB, and IBKE with IB, all implemented using the Qwen3-1.7B backbone model. For illustration, Medicine (Domain A) and Chemistry (Domain B) are featured as example edit domains, with training performed on the Medicine domain. An effective model editor should correct the model’s responses to generalization (Gen.) samples while preserving accuracy for locality (Loc.) samples.
  • Figure 2: The trade-off between generality and locality is illustrated for various editing methods and model backbones. Each data point represents the average generality and locality scores achieved by a given editor, computed as the mean over the UniEdit, MQuAKE, and CounterFact benchmark datasets.
  • Figure 3: Hyperparameter search for the learnable sequence length $l_m$ and the IB trade-off coefficient $\beta$, using GPT2-XL as the backbone. IBKEs with different configurations are trained on UniEdit, and the results show the average performance across the four datasets.
  • Figure 4: Ablation study of the IB mechanism and the scale factor (SF) $f_{W_{s}}(\tilde{s}_i)$, using GPT2-XL as the backbone. The four IBKE variants with different configurations are trained on the few-shot augmented data. Legends marked with a "–" sign indicate that the corresponding module has been removed.
  • Figure 5: Performance of IBKE with and without the IB mechanism, trained on five domains from different sectors in UniEdit, using Qwen3-1.7B as the backbone. (a) shows the overall performance of each training instance across the 25 domains in UniEdit, where the vertical axis represents the 25 test domains, and the horizontal axis represents the training domains. (b) shows the average generality over the five training instances across different combinations of criteria and hop counts in UniEdit, where the abbreviations denote Rephrase (Rep), Object Alias (OA), Subject Alias (SA), Multi-Hop (MH), and Relation Reverse (RR).
  • ...and 3 more figures