Trace and Edit Relation Associations in GPT
Jiahang Li, Taoyu Chen, Yuanli Wang
TL;DR
The paper tackles where entity-relational knowledge resides in GPT-like transformers and how it can be edited. It introduces relation tracing to identify critical early- and late-layer MLP components and uses causal mediation analysis, augmented with counterfactual data, to understand how relations are stored and recalled, benchmarking against ROME on FewRel. A modified ROME approach targets a rank-one update in the fifth MLP layer, yielding improved generalization and specificity in relation edits, with paraphrase-success rising to 41.07% from 40.71%. The work demonstrates the feasibility of precise, layer-targeted edits in relational knowledge, offering implications for controlled knowledge manipulation in language models and highlighting avenues for future architectural and methodological refinements.
Abstract
This study introduces a novel approach for analyzing and modifying entity relationships in GPT models, diverging from ROME's entity-focused methods. We develop a relation tracing technique to understand the influence of language model computations on relationship judgments. Using the FewRel dataset, we identify key roles of MLP modules and attention mechanisms in processing relationship information. Our method, tested against ROME on a new dataset, shows improved balance in specificity and generalization, underscoring the potential of manipulating early-layer modules for enhanced model understanding and accuracy.
