Table of Contents
Fetching ...

CollabEdit: Towards Non-destructive Collaborative Knowledge Editing

Jiamu Zheng, Jinghuai Zhang, Tianyu Du, Xuhong Zhang, Jianwei Yin, Tao Lin

TL;DR

CollabEdit tackles the problem of non-destructive collaborative knowledge editing for large language models under strict privacy constraints. It introduces a model-merging-based framework that communicates $\mathbf{K}\mathbf{K}^{\top}$, enabling privacy-preserving aggregation of local edits while closely matching the performance of an ideal Global-Edit. The paper identifies three core collaborative KE challenges—knowledge overlap, knowledge conflict, and forgetting—and offers targeted remedies: residual-based overlap detection, data-augmentation-driven conflict resolution, and a dynamic covariance approach to memory preservation across rounds. Empirical evaluation on Multi-CounterFact and zsRE with GPT-2 XL and GPT-J demonstrates robust editing performance and privacy protection across backends MEMIT and MALMEN.

Abstract

Collaborative learning of large language models (LLMs) has emerged as a new paradigm for utilizing private data from different parties to guarantee efficiency and privacy. Meanwhile, Knowledge Editing (KE) for LLMs has also garnered increased attention due to its ability to manipulate the behaviors of LLMs explicitly, yet leaves the collaborative KE case (in which knowledge edits of multiple parties are aggregated in a privacy-preserving and continual manner) unexamined. To this end, this manuscript dives into the first investigation of collaborative KE, in which we start by carefully identifying the unique three challenges therein, including knowledge overlap, knowledge conflict, and knowledge forgetting. We then propose a non-destructive collaborative KE framework, COLLABEDIT, which employs a novel model merging mechanism to mimic the global KE behavior while preventing the severe performance drop. Extensive experiments on two canonical datasets demonstrate the superiority of COLLABEDIT compared to other destructive baselines, and results shed light on addressing three collaborative KE challenges and future applications. Our code is available at https://github.com/LINs-lab/CollabEdit.

CollabEdit: Towards Non-destructive Collaborative Knowledge Editing

TL;DR

CollabEdit tackles the problem of non-destructive collaborative knowledge editing for large language models under strict privacy constraints. It introduces a model-merging-based framework that communicates , enabling privacy-preserving aggregation of local edits while closely matching the performance of an ideal Global-Edit. The paper identifies three core collaborative KE challenges—knowledge overlap, knowledge conflict, and forgetting—and offers targeted remedies: residual-based overlap detection, data-augmentation-driven conflict resolution, and a dynamic covariance approach to memory preservation across rounds. Empirical evaluation on Multi-CounterFact and zsRE with GPT-2 XL and GPT-J demonstrates robust editing performance and privacy protection across backends MEMIT and MALMEN.

Abstract

Collaborative learning of large language models (LLMs) has emerged as a new paradigm for utilizing private data from different parties to guarantee efficiency and privacy. Meanwhile, Knowledge Editing (KE) for LLMs has also garnered increased attention due to its ability to manipulate the behaviors of LLMs explicitly, yet leaves the collaborative KE case (in which knowledge edits of multiple parties are aggregated in a privacy-preserving and continual manner) unexamined. To this end, this manuscript dives into the first investigation of collaborative KE, in which we start by carefully identifying the unique three challenges therein, including knowledge overlap, knowledge conflict, and knowledge forgetting. We then propose a non-destructive collaborative KE framework, COLLABEDIT, which employs a novel model merging mechanism to mimic the global KE behavior while preventing the severe performance drop. Extensive experiments on two canonical datasets demonstrate the superiority of COLLABEDIT compared to other destructive baselines, and results shed light on addressing three collaborative KE challenges and future applications. Our code is available at https://github.com/LINs-lab/CollabEdit.

Paper Structure

This paper contains 41 sections, 1 theorem, 20 equations, 6 figures, 7 tables, 2 algorithms.

Key Result

Lemma 1

Take the KE method MEMIT as an example. Following the definitions in Section sec:definition, we denote $\mathbf{C}$ as an aggregated statistic over the previously stored keys of existing knowledge and use $\mathbf{K}_i$ to represent the new keys derived from client $i$'s edit. Then, the relationship See detailed proof in Appendix appendix:b.1.

Figures (6)

  • Figure 1: Limits of existing KE methods under the collaborative KE scenarios on the Multi-CounterFact dataset meng2022locating.
  • Figure 2: Comparison of global KE (Global-Edit) and collaborative KE.
  • Figure 3: The $\ell_2$-norm of residual $\mathbf{R}$ when data replication happens.
  • Figure 4: An example of using data augmentation to address the problem of knowledge conflict.
  • Figure 5: We show the average embedding similarity between recovered sequences (inferred from $\mathbf{K}$ or $\mathbf{K} \mathbf{K}^{\top}$ involving $M$ sequences) and their ground truths. The grey line is the average embedding similarity between two random text sequences.
  • ...and 1 more figures

Theorems & Definitions (3)

  • Lemma 1: The relationship between the weight updates from Global-Edit and local editing
  • Remark 1
  • Remark 2: CollabEdit is effective for multi-round editing