Table of Contents
Fetching ...

O-Edit: Orthogonal Subspace Editing for Language Model Sequential Editing

Yuchen Cai, Ding Cao

TL;DR

This work proposes Orthogonal Subspace Editing, O-Edit, an algorithm that orthogonalizes the direction of each knowledge update, minimizing interference between successive updates and reducing the impact of new updates on unrelated knowledge.

Abstract

Large language models (LLMs) acquire knowledge during pre-training, but over time, this knowledge may become incorrect or outdated, necessitating updates after training. Knowledge editing techniques address this issue without the need for costly re-training. However, most existing methods are designed for single edits, and as the number of edits increases, they often cause a decline in the model's overall performance, posing significant challenges for sequential editing. To overcome this, we propose Orthogonal Subspace Editing, O-Edit. This algorithm orthogonalizes the direction of each knowledge update, minimizing interference between successive updates and reducing the impact of new updates on unrelated knowledge. Our approach does not require replaying previously edited data and processes each edit knowledge on time. It can perform thousands of edits on mainstream LLMs, achieving an average performance improvement that is 4.2 times better than existing methods while effectively preserving the model's performance on downstream tasks, all with minimal additional parameter overhead.

O-Edit: Orthogonal Subspace Editing for Language Model Sequential Editing

TL;DR

This work proposes Orthogonal Subspace Editing, O-Edit, an algorithm that orthogonalizes the direction of each knowledge update, minimizing interference between successive updates and reducing the impact of new updates on unrelated knowledge.

Abstract

Large language models (LLMs) acquire knowledge during pre-training, but over time, this knowledge may become incorrect or outdated, necessitating updates after training. Knowledge editing techniques address this issue without the need for costly re-training. However, most existing methods are designed for single edits, and as the number of edits increases, they often cause a decline in the model's overall performance, posing significant challenges for sequential editing. To overcome this, we propose Orthogonal Subspace Editing, O-Edit. This algorithm orthogonalizes the direction of each knowledge update, minimizing interference between successive updates and reducing the impact of new updates on unrelated knowledge. Our approach does not require replaying previously edited data and processes each edit knowledge on time. It can perform thousands of edits on mainstream LLMs, achieving an average performance improvement that is 4.2 times better than existing methods while effectively preserving the model's performance on downstream tasks, all with minimal additional parameter overhead.

Paper Structure

This paper contains 31 sections, 32 equations, 8 figures, 10 tables, 2 algorithms.

Figures (8)

  • Figure 1: O-Edit constrains the direction of each update to lie within an orthogonal subspace.
  • Figure 2: The framework of O-Edit for sequential language model editing. (a) First, we compute gradients on a large amount of textual data without updating the model parameters. This step provides the gradient information necessary for updating model's implicit knowledge. (b) Next, we impose constraints on the update directions for each piece of edited knowledge, ensuring these directions are orthogonal to each other as well as to the directions of the model's implicit knowledge.
  • Figure 3: The downstream task performance (%) of models edited by four editing methods with Mistral-7B and Llama3-8B on the COUNTERFACT dataset.
  • Figure 4: This figure illustrates the update directions of three editing methods on the COUNTERFACT dataset, with orthogonality values scaled by a factor of 10 for clarity. The horizontal and vertical axes represent the selected editing samples.
  • Figure 5: The activation score caused by unrelated parameters. X-axis: Number of edits.
  • ...and 3 more figures