Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing
Akshat Gupta, Christine Fang, Atahan Ozdemir, Maochuan Lu, Ahmed Alaa, Thomas Hartvigsen, Gopala Anumanchipalli
TL;DR
The paper addresses stability challenges in localized sequential knowledge editing for large language models, showing that the Frobenius norm of updated weight matrices grows with successive edits across various post-training interventions ($W_{new}=W_{old}+\Delta W$). This norm growth is particularly problematic when edits are localized to subsystems, correlating with downstream degradation and reconfiguration of hidden representations. The study also reveals that internal activation norms decline and occupy different subspaces after edits, and activation orientations become increasingly misaligned with the unedited model, indicating substantial representation-space shifts. These findings motivate the need for regularization and robust editing techniques, such as ENCORE, to enable durable, scalable updates while preserving model utility.
Abstract
This study investigates the impact of localized updates to large language models (LLMs), specifically in the context of knowledge editing - a task aimed at incorporating or modifying specific facts without altering broader model capabilities. We first show that across different post-training interventions like continuous pre-training, full fine-tuning and LORA-based fine-tuning, the Frobenius norm of the updated matrices always increases. This increasing norm is especially detrimental for localized knowledge editing, where only a subset of matrices are updated in a model . We reveal a consistent phenomenon across various editing techniques, including fine-tuning, hypernetwork-based approaches, and locate-and-edit methods: the norm of the updated matrix invariably increases with successive updates. Such growth disrupts model balance, particularly when isolated matrices are updated while the rest of the model remains static, leading to potential instability and degradation of downstream performance. Upon deeper investigations of the intermediate activation vectors, we find that the norm of internal activations decreases and is accompanied by shifts in the subspaces occupied by these activations, which shows that these activation vectors now occupy completely different regions in the representation space compared to the unedited model. With our paper, we highlight the technical challenges with continuous and localized sequential knowledge editing and their implications for maintaining model stability and utility.
