Table of Contents
Fetching ...

A Survey on Knowledge Editing of Neural Networks

Vittorio Mazzia, Alessandro Pedrani, Andrea Caciolai, Kay Rottmann, Davide Bernardi

TL;DR

This survey formalizes knowledge editing as targeted, possibly sequential, updates to pre-trained models that preserve prior behavior while correcting specific knowledge failures. It classifies approaches into four families—regularization, meta-learning, direct model editing, and architectural strategies—and surveys tasks, datasets, and benchmarks across computer vision, NLP, safety-critical systems, and graphs. Key contributions include a unified problem statement, a taxonomy of edit types, defined editing properties (reliability, generality, locality, efficiency), and standard evaluation metrics (SR, GR, ES, DD, plus retainment variants). The work highlights the practical significance of efficient, data-efficient edits for large models, discusses scalability challenges, and outlines future research directions and risks, including runtime editing, interpretability, and security considerations.

Abstract

Deep neural networks are becoming increasingly pervasive in academia and industry, matching and surpassing human performance on a wide variety of fields and related tasks. However, just as humans, even the largest artificial neural networks make mistakes, and once-correct predictions can become invalid as the world progresses in time. Augmenting datasets with samples that account for mistakes or up-to-date information has become a common workaround in practical applications. However, the well-known phenomenon of catastrophic forgetting poses a challenge in achieving precise changes in the implicitly memorized knowledge of neural network parameters, often requiring a full model re-training to achieve desired behaviors. That is expensive, unreliable, and incompatible with the current trend of large self-supervised pre-training, making it necessary to find more efficient and effective methods for adapting neural network models to changing data. To address this need, knowledge editing is emerging as a novel area of research that aims to enable reliable, data-efficient, and fast changes to a pre-trained target model, without affecting model behaviors on previously learned tasks. In this survey, we provide a brief review of this recent artificial intelligence field of research. We first introduce the problem of editing neural networks, formalize it in a common framework and differentiate it from more notorious branches of research such as continuous learning. Next, we provide a review of the most relevant knowledge editing approaches and datasets proposed so far, grouping works under four different families: regularization techniques, meta-learning, direct model editing, and architectural strategies. Finally, we outline some intersections with other fields of research and potential directions for future works.

A Survey on Knowledge Editing of Neural Networks

TL;DR

This survey formalizes knowledge editing as targeted, possibly sequential, updates to pre-trained models that preserve prior behavior while correcting specific knowledge failures. It classifies approaches into four families—regularization, meta-learning, direct model editing, and architectural strategies—and surveys tasks, datasets, and benchmarks across computer vision, NLP, safety-critical systems, and graphs. Key contributions include a unified problem statement, a taxonomy of edit types, defined editing properties (reliability, generality, locality, efficiency), and standard evaluation metrics (SR, GR, ES, DD, plus retainment variants). The work highlights the practical significance of efficient, data-efficient edits for large models, discusses scalability challenges, and outlines future research directions and risks, including runtime editing, interpretability, and security considerations.

Abstract

Deep neural networks are becoming increasingly pervasive in academia and industry, matching and surpassing human performance on a wide variety of fields and related tasks. However, just as humans, even the largest artificial neural networks make mistakes, and once-correct predictions can become invalid as the world progresses in time. Augmenting datasets with samples that account for mistakes or up-to-date information has become a common workaround in practical applications. However, the well-known phenomenon of catastrophic forgetting poses a challenge in achieving precise changes in the implicitly memorized knowledge of neural network parameters, often requiring a full model re-training to achieve desired behaviors. That is expensive, unreliable, and incompatible with the current trend of large self-supervised pre-training, making it necessary to find more efficient and effective methods for adapting neural network models to changing data. To address this need, knowledge editing is emerging as a novel area of research that aims to enable reliable, data-efficient, and fast changes to a pre-trained target model, without affecting model behaviors on previously learned tasks. In this survey, we provide a brief review of this recent artificial intelligence field of research. We first introduce the problem of editing neural networks, formalize it in a common framework and differentiate it from more notorious branches of research such as continuous learning. Next, we provide a review of the most relevant knowledge editing approaches and datasets proposed so far, grouping works under four different families: regularization techniques, meta-learning, direct model editing, and architectural strategies. Finally, we outline some intersections with other fields of research and potential directions for future works.
Paper Structure (27 sections, 13 equations, 2 figures, 4 tables)

This paper contains 27 sections, 13 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: The knowledge editing problem has been firstly proposed as the task of modifying a model based on a set of individual pairs of edits, in a non-sequential manner (a), sinitsineditable. Successive works extended the problem to batch of edits (b), sequential individual edits (c), and sequential batch of edits (d). Evaluation metrics are similar to all cases, as described in Section \ref{['sec:evaluation_metrics']}.
  • Figure 2: Scaling curves showing three different evaluation metrics with an increased number of non-successive batch edits for three different KE methodologies: MEND, ROME, and MEMIT. Results are computed using CounterFact and GPT-J. Locality is shown not as drawdown (DD), but as its complementary specificity over a neighborhood of samples meng2022locating. ROME and MEND performs well up to ten edits, but rapidly degrade, losing almost all SR before batches of 1k. On the other hand, MEMIT performs well with considerable large batches of edits. Adapted from meng2022mass.