Table of Contents
Fetching ...

Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge

Daniel Tamayo, Aitor Gonzalez-Agirre, Javier Hernando, Marta Villegas

TL;DR

The paper tackles the challenge of ground truth and reliable knowledge editing in transformer-based language models, with a focus on cross-lingual capabilities. It extends MEMIT by introducing MEMAT, which identifies and optimizes a set of attention heads in a secondary language to refine edited factual associations, guided by an ITI-inspired framework. Across English and Catalan, MEMAT shows significant gains over MEMIT on multiple metrics with minimal parameter changes, and demonstrates portability to unseen languages, aided by a cross-lingual head-selection strategy. The work highlights the dual roles of subject tokenization and language-independent attention signals in cross-lingual knowledge editing and lays groundwork for more language-robust, explainable editing methods.

Abstract

Recent research has explored methods for updating and modifying factual knowledge in large language models, often focusing on specific multi-layer perceptron blocks. This study expands on this work by examining the effectiveness of existing knowledge editing methods across languages and delving into the role of attention mechanisms in this process. Drawing from the insights gained, we propose Mass-Editing Memory with Attention in Transformers (MEMAT), a method that achieves significant improvements in all metrics while requiring minimal parameter modifications. MEMAT delivers a remarkable 10% increase in magnitude metrics, benefits languages not included in the training data and also demonstrates a high degree of portability. Our code and data are at https://github.com/dtamayo-nlp/MEMAT.

Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge

TL;DR

The paper tackles the challenge of ground truth and reliable knowledge editing in transformer-based language models, with a focus on cross-lingual capabilities. It extends MEMIT by introducing MEMAT, which identifies and optimizes a set of attention heads in a secondary language to refine edited factual associations, guided by an ITI-inspired framework. Across English and Catalan, MEMAT shows significant gains over MEMIT on multiple metrics with minimal parameter changes, and demonstrates portability to unseen languages, aided by a cross-lingual head-selection strategy. The work highlights the dual roles of subject tokenization and language-independent attention signals in cross-lingual knowledge editing and lays groundwork for more language-robust, explainable editing methods.

Abstract

Recent research has explored methods for updating and modifying factual knowledge in large language models, often focusing on specific multi-layer perceptron blocks. This study expands on this work by examining the effectiveness of existing knowledge editing methods across languages and delving into the role of attention mechanisms in this process. Drawing from the insights gained, we propose Mass-Editing Memory with Attention in Transformers (MEMAT), a method that achieves significant improvements in all metrics while requiring minimal parameter modifications. MEMAT delivers a remarkable 10% increase in magnitude metrics, benefits languages not included in the training data and also demonstrates a high degree of portability. Our code and data are at https://github.com/dtamayo-nlp/MEMAT.

Paper Structure

This paper contains 24 sections, 7 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Results of Efficacy, Generalization and Specificity when applying MEMIT separately in two different languages and evaluating the effects of training in both. Each depicted line show a restriction in the tokenization of the subjects.
  • Figure 2: Accuracy on the validation set for all heads in all layers in Ǎguila-7B considering two combinations of $L_1$ and $L_2$. The performance peaks include 78.1% and 82.2%. The number of samples introduced using MEMIT is 1,000.
  • Figure 3: Illustration depicting the key steps of MEMAT in Ǎguila-7B. The dataset languages, denoted as $L_1$ and $L_2$, are not restricted to differ or remain equal, but in this diagram we consider both datasets to store the same triplets. The Eagle images were generated using GPT-4.
  • Figure 4: MEMIT and MEMAT scaling curves plot showing the performance of English and Catalan against number of edits (log-scale) when only using Catalan training data. The error correspond to a 68% confidence interval.
  • Figure 5: Example of a Catalan CounterFact sample.
  • ...and 6 more figures