Table of Contents
Fetching ...

Trace and Edit Relation Associations in GPT

Jiahang Li, Taoyu Chen, Yuanli Wang

TL;DR

The paper tackles where entity-relational knowledge resides in GPT-like transformers and how it can be edited. It introduces relation tracing to identify critical early- and late-layer MLP components and uses causal mediation analysis, augmented with counterfactual data, to understand how relations are stored and recalled, benchmarking against ROME on FewRel. A modified ROME approach targets a rank-one update in the fifth MLP layer, yielding improved generalization and specificity in relation edits, with paraphrase-success rising to 41.07% from 40.71%. The work demonstrates the feasibility of precise, layer-targeted edits in relational knowledge, offering implications for controlled knowledge manipulation in language models and highlighting avenues for future architectural and methodological refinements.

Abstract

This study introduces a novel approach for analyzing and modifying entity relationships in GPT models, diverging from ROME's entity-focused methods. We develop a relation tracing technique to understand the influence of language model computations on relationship judgments. Using the FewRel dataset, we identify key roles of MLP modules and attention mechanisms in processing relationship information. Our method, tested against ROME on a new dataset, shows improved balance in specificity and generalization, underscoring the potential of manipulating early-layer modules for enhanced model understanding and accuracy.

Trace and Edit Relation Associations in GPT

TL;DR

The paper tackles where entity-relational knowledge resides in GPT-like transformers and how it can be edited. It introduces relation tracing to identify critical early- and late-layer MLP components and uses causal mediation analysis, augmented with counterfactual data, to understand how relations are stored and recalled, benchmarking against ROME on FewRel. A modified ROME approach targets a rank-one update in the fifth MLP layer, yielding improved generalization and specificity in relation edits, with paraphrase-success rising to 41.07% from 40.71%. The work demonstrates the feasibility of precise, layer-targeted edits in relational knowledge, offering implications for controlled knowledge manipulation in language models and highlighting avenues for future architectural and methodological refinements.

Abstract

This study introduces a novel approach for analyzing and modifying entity relationships in GPT models, diverging from ROME's entity-focused methods. We develop a relation tracing technique to understand the influence of language model computations on relationship judgments. Using the FewRel dataset, we identify key roles of MLP modules and attention mechanisms in processing relationship information. Our method, tested against ROME on a new dataset, shows improved balance in specificity and generalization, underscoring the potential of manipulating early-layer modules for enhanced model understanding and accuracy.
Paper Structure (14 sections, 4 figures, 3 tables)

This paper contains 14 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Data item for evaluation on modifying different layer of MLP
  • Figure 2: The relation impact on output probability
  • Figure 3: Relation distribution in the dataset for evaluation
  • Figure 4: Visualized Performance and variance after modifying corresponding layer of MLP in GPT