Memory Self-Regeneration: Uncovering Hidden Knowledge in Unlearned Models
Agnieszka Polowczyk, Alicja Polowczyk, Joanna Waczyńska, Piotr Borycki, Przemysław Spurek
TL;DR
The paper addresses the challenge of truly forgetting specific knowledge in diffusion-based generative models, where erased concepts can linger in memory and be reactivated by adversarial prompts. It introduces Memory Self-Regeneration (MSR) and the MemoRa strategy, a LoRA-based memory-recovery pipeline that uses DDIM inversion and spherical interpolation to recollect forgotten concepts from only a few samples without full retraining. The authors define two forgetting modes—short-term forgetting (STM) and long-term forgetting (LTM)—and demonstrate that retrieval-robustness should be a core metric for unlearning methods. They validate MemoRa across nudity, objects, and style concepts, show AutoMemoRa and Multi-MemoRa can boost recall and speed, and discuss limitations in recovering deeply embedded memories (LTM). The work highlights residual memory as a security and safety risk in unlearning and provides a practical framework for evaluating and strengthening memory-regeneration capabilities in diffusion models.
Abstract
The impressive capability of modern text-to-image models to generate realistic visuals has come with a serious drawback: they can be misused to create harmful, deceptive or unlawful content. This has accelerated the push for machine unlearning. This new field seeks to selectively remove specific knowledge from a model's training data without causing a drop in its overall performance. However, it turns out that actually forgetting a given concept is an extremely difficult task. Models exposed to attacks using adversarial prompts show the ability to generate so-called unlearned concepts, which can be not only harmful but also illegal. In this paper, we present considerations regarding the ability of models to forget and recall knowledge, introducing the Memory Self-Regeneration task. Furthermore, we present MemoRa strategy, which we consider to be a regenerative approach supporting the effective recovery of previously lost knowledge. Moreover, we propose that robustness in knowledge retrieval is a crucial yet underexplored evaluation measure for developing more robust and effective unlearning techniques. Finally, we demonstrate that forgetting occurs in two distinct ways: short-term, where concepts can be quickly recalled, and long-term, where recovery is more challenging. Code is available at https://gmum.github.io/MemoRa/.
