InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing

Shuaiyi Li; Zhisong Zhang; Yang Deng; Chenlong Deng; Tianqing Fang; Hongming Zhang; Haitao Mi; Dong Yu; Wai Lam

InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing

Shuaiyi Li, Zhisong Zhang, Yang Deng, Chenlong Deng, Tianqing Fang, Hongming Zhang, Haitao Mi, Dong Yu, Wai Lam

TL;DR

InComeS tackles the inefficiency and context-window limitations of in-context learning for model editing by compressing each edit into a gist-token KV cache and introducing cross-attention-based selection over a gist pool. A meta-training regime aligns compressed gist representations with editing objectives via token-weighted cross-entropy and distillation, enabling effective retrieval of relevant edits even as batch size grows. Across multi-hop, natural-language, and portability-focused benchmarks, InComeS outperforms strong baselines in complex editing scenarios and scales better with larger edit batches, while offering notable improvements in efficiency. This approach significantly enhances practical editability of large language models with reduced context requirements, paving the way for more flexible, scalable knowledge updates in real-world deployments.

Abstract

Although existing model editing methods perform well in recalling exact edit facts, they often struggle in complex scenarios that require deeper semantic understanding rather than mere knowledge regurgitation. Leveraging the strong contextual reasoning abilities of large language models (LLMs), in-context learning (ICL) becomes a promising editing method by comprehending edit information through context encoding. However, this method is constrained by the limited context window of LLMs, leading to degraded performance and efficiency as the number of edits increases. To overcome this limitation, we propose InComeS, a flexible framework that enhances LLMs' ability to process editing contexts through explicit compression and selection mechanisms. Specifically, InComeS compresses each editing context into the key-value (KV) cache of a special gist token, enabling efficient handling of multiple edits without being restricted by the model's context window. Furthermore, specialized cross-attention modules are added to dynamically select the most relevant information from the gist pools, enabling adaptive and effective utilization of edit information. We conduct experiments on diverse model editing benchmarks with various editing formats, and the results demonstrate the effectiveness and efficiency of our method.

InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing

TL;DR

Abstract

InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)