Table of Contents
Fetching ...

Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?

Peter Hase, Thomas Hofweber, Xiang Zhou, Elias Stengel-Eskin, Mohit Bansal

TL;DR

This paper critiques the standard formulation of the model editing problem and proposes a formal testbed for model editing research, and introduces a semi-synthetic dataset for model editing based on Wikidata, where one can evaluate edits against labels given by an idealized Bayesian agent.

Abstract

The model editing problem concerns how language models should learn new facts about the world over time. While empirical research on model editing has drawn widespread attention, the conceptual foundations of model editing remain shaky -- perhaps unsurprisingly, since model editing is essentially belief revision, a storied problem in philosophy that has eluded succinct solutions for decades. Model editing nonetheless demands a solution, since we need to be able to control the knowledge within language models. With this goal in mind, this paper critiques the standard formulation of the model editing problem and proposes a formal testbed for model editing research. We first describe 12 open problems with model editing, based on challenges with (1) defining the problem, (2) developing benchmarks, and (3) assuming LLMs have editable beliefs in the first place. Many of these challenges are extremely difficult to address, e.g. determining far-reaching consequences of edits, labeling probabilistic entailments between facts, and updating beliefs of agent simulators. Next, we introduce a semi-synthetic dataset for model editing based on Wikidata, where we can evaluate edits against labels given by an idealized Bayesian agent. This enables us to say exactly how belief revision in language models falls short of a desirable epistemic standard. We encourage further research exploring settings where such a gold standard can be compared against. Our code is publicly available at: https://github.com/peterbhase/LLM-belief-revision

Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?

TL;DR

This paper critiques the standard formulation of the model editing problem and proposes a formal testbed for model editing research, and introduces a semi-synthetic dataset for model editing based on Wikidata, where one can evaluate edits against labels given by an idealized Bayesian agent.

Abstract

The model editing problem concerns how language models should learn new facts about the world over time. While empirical research on model editing has drawn widespread attention, the conceptual foundations of model editing remain shaky -- perhaps unsurprisingly, since model editing is essentially belief revision, a storied problem in philosophy that has eluded succinct solutions for decades. Model editing nonetheless demands a solution, since we need to be able to control the knowledge within language models. With this goal in mind, this paper critiques the standard formulation of the model editing problem and proposes a formal testbed for model editing research. We first describe 12 open problems with model editing, based on challenges with (1) defining the problem, (2) developing benchmarks, and (3) assuming LLMs have editable beliefs in the first place. Many of these challenges are extremely difficult to address, e.g. determining far-reaching consequences of edits, labeling probabilistic entailments between facts, and updating beliefs of agent simulators. Next, we introduce a semi-synthetic dataset for model editing based on Wikidata, where we can evaluate edits against labels given by an idealized Bayesian agent. This enables us to say exactly how belief revision in language models falls short of a desirable epistemic standard. We encourage further research exploring settings where such a gold standard can be compared against. Our code is publicly available at: https://github.com/peterbhase/LLM-belief-revision
Paper Structure (31 sections, 5 equations, 4 figures, 6 tables)

This paper contains 31 sections, 5 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: In the predominant formulation of model editing, an LLM's weights are updated so that it gives a new output for a specific input. Even for a simple new fact about the world, however, it can be hard to specify its exact consequences in theory (Sec. \ref{['sec:challenges_defining']}), or it may be challenging to crowdsource labels for the data in practice (Sec. \ref{['sec:challenges_benchmarks']}). It is also not clear that LLMs have coherent, revisable beliefs to begin with (Sec. \ref{['sec:challenges_editable_LLMs']}).
  • Figure 2: A requested edit and test cases in our dataset. We edit a language model with the requested edit, after pretraining on a semi-synthetic corpus. Our test cases measure how close the edited LM's probabilities are to posterior probabilities from a Bayesian model fit to both the pretraining corpus and the requested edit.
  • Figure 3: We train an 83m parameter Transformer on our corpus for 1b tokens, achieving a good fit to the underlying facts.
  • Figure 4: We edit our model to replace the fact Grace Stone Coates educated at scions with the fact Grace Stone Coates educated at san salvador university. While the model successfully learns a high probability for the edit request sentence, the edited model fails to generalize properly to downstream entailed sentences (probabilistic coherence) or logically related sentences (logical coherence). For instance, in our hypothetical world the subject's most likely occupation should change from hollywood producer$\rightarrow$Politician, but the LM does not respect this inference. Ideally, LLMs would achieve the same beliefs as a rational Bayesian agent (that has posterior credences in blue and pre-update credences in red). For logical coherence, $A$ is the edit request sentence "$s \ r \ o$", and $B$ is another arbitrary sentence.