The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models

Ryosuke Takahashi; Go Kamoda; Benjamin Heinzerling; Keisuke Sakaguchi; Kentaro Inui

The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models

Ryosuke Takahashi, Go Kamoda, Benjamin Heinzerling, Keisuke Sakaguchi, Kentaro Inui

TL;DR

The paper tackles privacy and safety concerns in language models by examining how deleting knowledge affects related information, especially for popular entities. It establishes a controlled experimental workflow using synthetic knowledge graphs (ER and BA) and the ROME editing method, combined with causal tracing to identify key FFN components and a rank-one update to implement deletions. The main findings show that removing knowledge linked to frequently occurring entities can cause substantial, even catastrophic, side effects in BA-like structures, while ER-like structures exhibit weaker or no such effects, underscoring the influence of underlying knowledge topology. This work introduces synthetic knowledge graphs as a powerful testbed for analyzing knowledge deletion, with implications for safe knowledge editing and privacy-preserving practices in real-world LMs.

Abstract

Language models (LMs) encode world knowledge in their internal parameters through training. However, LMs may learn personal and confidential information from the training data, leading to privacy concerns such as data leakage. Therefore, research on knowledge deletion from LMs is essential. This study focuses on the knowledge stored in LMs and analyzes the relationship between the side effects of knowledge deletion and the entities related to the knowledge. Our findings reveal that deleting knowledge related to popular entities can have catastrophic side effects. Furthermore, this research is the first to analyze knowledge deletion in models trained on synthetic knowledge graphs, indicating a new direction for controlled experiments.

The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models

TL;DR

Abstract

Paper Structure (15 sections, 2 equations, 6 figures, 3 tables)

This paper contains 15 sections, 2 equations, 6 figures, 3 tables.

Introduction
Experimental Design and Approach
Experimental Setup
Knowledge Graphs
Storing Knowledge Graphs in LMs
Knowledge Editing
Step 1: Causal Tracing
Step 2: Rank-One Model Editing
Knowledge Deletion from LMs
Experiment
Procedure
Results and Discussion
Conclusion
Recognition of Paraphrased Representation in LMs
Supplemental Information on the Knowledge Editing Method

Figures (6)

Figure 1: Overview of the analysis flow for the side effects of knowledge deletion using a synthetic knowledge graph: 1. First, create a synthetic knowledge graph. 2. Train the LM on the created knowledge graph. 3. Apply the knowledge editing method, ROME, to delete a specific knowledge instance. 4. Analyze the side effects of the deleted knowledge by comparing the model's accuracy on the trained knowledge before and after the deletion. As a result, we reveal that deleting knowledge related to popular entities has catastrophic side effects.
Figure 2: Two synthetic knowledge graphs we created. (Left) Erdős-Rényi graph: Features a relatively uniform degree distribution of vertices, representing a simple structure. (Right) Barabási-Albert graph: The degree distribution of vertices follows a power law, reflecting the properties of complex networks in the real world.
Figure 3: Results of the principal component analysis (PCA) on the embedding representations in the LM after training ER graphs. The left side represents the PCA results for entity embeddings, while the right represents the PCA results for relation embeddings. Each entity and relation has five paraphrases and paraphrases about the same entity or relation are illustrated in the same color (here, paraphrases of six entities and relations are highlighted). The PCA results indicate that the embeddings of paraphrases cluster together, suggesting that the LM recognizes paraphrases.
Figure 4: The relationship between the degree of entities, or subjects, and the impact of their deletion in a 6-layer GPT model trained on the knowledge graph. The left vertical axis indicates the degree of entities, while the horizontal axis represents the corresponding entities. The right vertical axis shows the impact (amount of side effects) on other knowledge when deleting a knowledge instance related to an entity. When it comes to deleting knowledge related to a specific entity, we have observed that there is no relationship between the degree (i.e., number of connections) of the entity in the ER graph and the impact of its deletion. However, in the BA graph, there is a clear relationship between the degree of the entity and the impact of its deletion. Since the impact of deleting knowledge in LMs can have significant side effects on related knowledge, it is recommended to avoid deleting knowledge related to frequent entities in LMs trained on knowledge structures that are closer to the real world, as doing so may have catastrophic consequences.
Figure 5: PCA results of the embedding representations in the LM before training the ER graph.
...and 1 more figures

The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models

TL;DR

Abstract

The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)