Table of Contents
Fetching ...

Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language Models

Xiyu Liu, Zhengxiao Liu, Naibin Gu, Zheng Lin, Wanli Ma, Ji Xiang, Weiping Wang

TL;DR

The paper identifies a critical over-generalization problem in knowledge editing for auto-regressive transformers when edits target subject knowledge alone. It reveals that relational information is progressively encoded and recovered, with the last relation token serving as the pivotal point for recall and the MLP sublayers driving the accumulation of relation attributes. Building on this, the authors propose RETS, a relation-focused editing method that modifies the middle-late MLP at the last relation token while enforcing subject constraints to prevent unintended changes to neighboring facts. Empirical evidence on COUNTERFACT and zsRE shows RETS significantly improves Relation Specificity by over 30% and maintains competitive performance on key editing metrics, supporting the proposed relation-focused interpretation and highlighting new directions for knowledge editing in transformers.

Abstract

The storage and recall of factual associations in auto-regressive transformer language models (LMs) have drawn a great deal of attention, inspiring knowledge editing by directly modifying the located model weights. Most editing works achieve knowledge editing under the guidance of existing interpretations of knowledge recall that mainly focus on subject knowledge. However, these interpretations are seriously flawed, neglecting relation information and leading to the over-generalizing problem for editing. In this work, we discover a novel relation-focused perspective to interpret the knowledge recall of transformer LMs during inference and apply it on single knowledge editing to avoid over-generalizing. Experimental results on the dataset supplemented with a new R-Specificity criterion demonstrate that our editing approach significantly alleviates over-generalizing while remaining competitive on other criteria, breaking the domination of subject-focused editing for future research.

Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language Models

TL;DR

The paper identifies a critical over-generalization problem in knowledge editing for auto-regressive transformers when edits target subject knowledge alone. It reveals that relational information is progressively encoded and recovered, with the last relation token serving as the pivotal point for recall and the MLP sublayers driving the accumulation of relation attributes. Building on this, the authors propose RETS, a relation-focused editing method that modifies the middle-late MLP at the last relation token while enforcing subject constraints to prevent unintended changes to neighboring facts. Empirical evidence on COUNTERFACT and zsRE shows RETS significantly improves Relation Specificity by over 30% and maintains competitive performance on key editing metrics, supporting the proposed relation-focused interpretation and highlighting new directions for knowledge editing in transformers.

Abstract

The storage and recall of factual associations in auto-regressive transformer language models (LMs) have drawn a great deal of attention, inspiring knowledge editing by directly modifying the located model weights. Most editing works achieve knowledge editing under the guidance of existing interpretations of knowledge recall that mainly focus on subject knowledge. However, these interpretations are seriously flawed, neglecting relation information and leading to the over-generalizing problem for editing. In this work, we discover a novel relation-focused perspective to interpret the knowledge recall of transformer LMs during inference and apply it on single knowledge editing to avoid over-generalizing. Experimental results on the dataset supplemented with a new R-Specificity criterion demonstrate that our editing approach significantly alleviates over-generalizing while remaining competitive on other criteria, breaking the domination of subject-focused editing for future research.
Paper Structure (35 sections, 14 equations, 8 figures, 9 tables)

This paper contains 35 sections, 14 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: The over-generalizing problem. The circle in green denotes the correctly edited target entity and circles in red denote that the entities unrelated to target editing are also changed unexpectedly.
  • Figure 2: Average Indirect Effect of Relation results for MLP and MHSA sublayers over 1000 facts on GPT2-XL. X-axis shows the layers. "rp" stands for the relation prefix in front of the subject (e.g. "The mother tongue language of" in the input prompt "The mother tongue language of Isabelle Breitman is"). "fs", "ms" and "ls" stand for the first-subject token, middle-subject tokens and last-subject token. "fr", "mr" and "lr" stand for the first-relation token, middle-relation tokens and last-relation token. "*" marks the intervened tokens in the corrupted run.
  • Figure 3: The factual information detection on the vocabulary lens of the last-relation representation for GPT2-XL over 1000 prompts. (a) The average attributes rates as shown in yellow bars. (b) The average attributes rate decline at 48-th layer while blocking the MLP or MHSA sublayer respectively. (c) The average rankings of the target objects and random tokens.
  • Figure 4: Our RETS method based on the relation-focused recall of factual associations. We reveal that the last-relation representation encodes relation-related attributes (A) which are accumulated until middle-late layers and (B) the predicted object is extracted from these attributes. Based on this relation-focused interpretation, we propose the RETS knowledge editing method that (C) modifies the middle-late MLP sublayer with the constraints of the subject.
  • Figure 5: The performance of RETS (purple lines) editing at the last-relation token on different layers (x-axis) compared with ROME (orange line) editing at the last-subject token for 50 prompts. Std deviation is shown in areas.
  • ...and 3 more figures