Untying the Reversal Curse via Bidirectional Language Model Editing

Jun-Yu Ma; Jia-Chen Gu; Zhen-Hua Ling; Quan Liu; Cong Liu

Untying the Reversal Curse via Bidirectional Language Model Editing

Jun-Yu Ma, Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, Cong Liu

TL;DR

This work uncovers a reversal curse in language model editing: while current methods can insert and recall facts in the editing direction, they fail to recall those edits in the reverse direction. It introduces BAKE, a bidirectional benchmark with QA and judgment tasks across four relation categories, and a reversibility metric to quantify bidirectional recall. To mitigate the problem, it proposes BIRD, a bidirectional objective that reinforces invertible relationships between subject and object during editing. Empirical results across four LLMs show that BIRD improves reversibility (RJS) and generalization while exposing ongoing challenges in achieving high reverse-direction performance (RQS), highlighting the need for further bidirectional editing research for safer and more reliable knowledge manipulation in LLMs.

Abstract

Recent studies have demonstrated that large language models (LLMs) store massive factual knowledge within their parameters. But existing LLMs are prone to hallucinate unintended text due to false or outdated knowledge. Since retraining LLMs is resource intensive, there has been a growing interest in the concept of model editing. Despite the emergence of benchmarks and approaches, these unidirectional editing and evaluation have failed to explore the reversal curse. Intuitively, if "The capital of France is" is edited to be a counterfact "London" within a model, then it should be able to naturally reason and recall the reverse fact, i.e., "London is the capital of" followed by "France" instead of "England". In this paper, we study bidirectional language model editing, aiming to provide rigorous model editing evaluation to assess if edited LLMs can recall the editing knowledge bidirectionally. A new evaluation metric of reversibility is introduced, and a benchmark dubbed as Bidirectional Assessment for Knowledge Editing (BAKE) is constructed to evaluate the reversibility of edited models in recalling knowledge in the reverse direction of editing. We surprisingly observe that while current editing methods and LLMs can effectively recall editing facts in the direction of editing, they suffer serious deficiencies when evaluated in the reverse direction. To mitigate the reversal curse, a method named Bidirectionally Inversible Relationship moDeling (BIRD) is proposed. A set of editing objectives that incorporate bidirectional relationships between subject and object into the updated model weights are designed. Experiments show that BIRD improves the performance of four representative LLMs of different sizes via question answering and judgement.

Untying the Reversal Curse via Bidirectional Language Model Editing

TL;DR

Abstract

Paper Structure (34 sections, 8 equations, 8 figures, 6 tables)

This paper contains 34 sections, 8 equations, 8 figures, 6 tables.

Introduction
Related Work
Model Editing
The Reversal Curse of LLMs
Preliminary
Model Editing
Rank-One Model Editing (ROME)
BAKE: Bidirectional Assessment for Knowledge Editing
Relation Category
Data Construction of BAKE-Q&J
Constructing counterfactual edits
Constructing reverse prompts
Data Construction of BAKE-J
Dataset Summary
Dataset Format
...and 19 more sections

Figures (8)

Figure 1: Comparison of unidirectional evaluation paradigms that assess whether edited models can recall the editing facts (a) via single-hop questions, or (b) via entailed questions in the direction of editing; and (c) the proposed bidirectional paradigm that assesses model editing in the reverse direction of editing. $f_{\theta}$ / $f_{\theta_{e}}$ denotes the models before / after editing.
Figure 2: Examples of four relation categories. A given relation is considered (a) one-to-one if a subject can be associated with at most one object; (b) one-to-many if a subject can be associated with multiple objects; (c) many-to-one if multiple subjects can be linked to the same object; or (d) many-to-many if multiple subjects can be linked to multiple objects.
Figure 3: Illustration of BIRD which (a) enhances the association of the new fact bidirectionally, and (b) weakens the association of the old fact bidirectionally.
Figure 4: Average log probability of the original and desired answers after editing on the reverse-qa prompts. The closer the value is to 0, the higher the probability.
Figure 5: Performance of BIRD with different $\alpha$ (a) or $\beta$ (b) on BAKE-J in terms of RJS.
...and 3 more figures

Untying the Reversal Curse via Bidirectional Language Model Editing

TL;DR

Abstract

Untying the Reversal Curse via Bidirectional Language Model Editing

Authors

TL;DR

Abstract

Table of Contents

Figures (8)