Table of Contents
Fetching ...

Memory Self-Regeneration: Uncovering Hidden Knowledge in Unlearned Models

Agnieszka Polowczyk, Alicja Polowczyk, Joanna Waczyńska, Piotr Borycki, Przemysław Spurek

TL;DR

The paper addresses the challenge of truly forgetting specific knowledge in diffusion-based generative models, where erased concepts can linger in memory and be reactivated by adversarial prompts. It introduces Memory Self-Regeneration (MSR) and the MemoRa strategy, a LoRA-based memory-recovery pipeline that uses DDIM inversion and spherical interpolation to recollect forgotten concepts from only a few samples without full retraining. The authors define two forgetting modes—short-term forgetting (STM) and long-term forgetting (LTM)—and demonstrate that retrieval-robustness should be a core metric for unlearning methods. They validate MemoRa across nudity, objects, and style concepts, show AutoMemoRa and Multi-MemoRa can boost recall and speed, and discuss limitations in recovering deeply embedded memories (LTM). The work highlights residual memory as a security and safety risk in unlearning and provides a practical framework for evaluating and strengthening memory-regeneration capabilities in diffusion models.

Abstract

The impressive capability of modern text-to-image models to generate realistic visuals has come with a serious drawback: they can be misused to create harmful, deceptive or unlawful content. This has accelerated the push for machine unlearning. This new field seeks to selectively remove specific knowledge from a model's training data without causing a drop in its overall performance. However, it turns out that actually forgetting a given concept is an extremely difficult task. Models exposed to attacks using adversarial prompts show the ability to generate so-called unlearned concepts, which can be not only harmful but also illegal. In this paper, we present considerations regarding the ability of models to forget and recall knowledge, introducing the Memory Self-Regeneration task. Furthermore, we present MemoRa strategy, which we consider to be a regenerative approach supporting the effective recovery of previously lost knowledge. Moreover, we propose that robustness in knowledge retrieval is a crucial yet underexplored evaluation measure for developing more robust and effective unlearning techniques. Finally, we demonstrate that forgetting occurs in two distinct ways: short-term, where concepts can be quickly recalled, and long-term, where recovery is more challenging. Code is available at https://gmum.github.io/MemoRa/.

Memory Self-Regeneration: Uncovering Hidden Knowledge in Unlearned Models

TL;DR

The paper addresses the challenge of truly forgetting specific knowledge in diffusion-based generative models, where erased concepts can linger in memory and be reactivated by adversarial prompts. It introduces Memory Self-Regeneration (MSR) and the MemoRa strategy, a LoRA-based memory-recovery pipeline that uses DDIM inversion and spherical interpolation to recollect forgotten concepts from only a few samples without full retraining. The authors define two forgetting modes—short-term forgetting (STM) and long-term forgetting (LTM)—and demonstrate that retrieval-robustness should be a core metric for unlearning methods. They validate MemoRa across nudity, objects, and style concepts, show AutoMemoRa and Multi-MemoRa can boost recall and speed, and discuss limitations in recovering deeply embedded memories (LTM). The work highlights residual memory as a security and safety risk in unlearning and provides a practical framework for evaluating and strengthening memory-regeneration capabilities in diffusion models.

Abstract

The impressive capability of modern text-to-image models to generate realistic visuals has come with a serious drawback: they can be misused to create harmful, deceptive or unlawful content. This has accelerated the push for machine unlearning. This new field seeks to selectively remove specific knowledge from a model's training data without causing a drop in its overall performance. However, it turns out that actually forgetting a given concept is an extremely difficult task. Models exposed to attacks using adversarial prompts show the ability to generate so-called unlearned concepts, which can be not only harmful but also illegal. In this paper, we present considerations regarding the ability of models to forget and recall knowledge, introducing the Memory Self-Regeneration task. Furthermore, we present MemoRa strategy, which we consider to be a regenerative approach supporting the effective recovery of previously lost knowledge. Moreover, we propose that robustness in knowledge retrieval is a crucial yet underexplored evaluation measure for developing more robust and effective unlearning techniques. Finally, we demonstrate that forgetting occurs in two distinct ways: short-term, where concepts can be quickly recalled, and long-term, where recovery is more challenging. Code is available at https://gmum.github.io/MemoRa/.

Paper Structure

This paper contains 11 sections, 4 equations, 29 figures, 16 tables.

Figures (29)

  • Figure 1: Unlearned models may still retain residual memory of a given concept. We introduce MemoRa, a strategy for Memory Self-Regeneration, showing that even a small number of samples can trigger the recall of a forgotten concept. This finding underscores the importance of exercising greater caution when evaluating unlearning methods, as residual knowledge may pose risks in sensitive or regulated contexts. We further observe two distinct modes of forgetting: a short-term form, where concepts can be quickly recalled, and a long-term form, where recovery is slower and demanding.
  • Figure 2: An attempt to forget the concept of nudity by the Flux-based ESD model and the application of the MemoRa strategy. Results indicate that ESD has only temporarily forgotten this concept, making it possible to quickly recover it.
  • Figure 3: Our method aims to recover unlearned information using only a few images that contain removed concepts. We first expand the training set using DDIM inversion and diversify it via spherical interpolation. Next, we fine-tune a LoRA adapter to restore the erased concept. Results reveal two types of forgetting: short-term, where knowledge is quickly recovered, and long-term, where recovery is harder. We hypothesize that short-term forgetting corresponds to superficial removal of knowledge, where concepts are not replaced but merely hidden, whereas long-term forgetting induces substantial changes in the distribution of data related to the forgotten concept, effectively altering the underlying representations.
  • Figure 4: Visualizations of the MemoRa strategy applied to the FLUX.1 [dev] model for the "parachute" concept. Notably, MemoRa achieves a highly faithful restoration of the original visual characteristics, demonstrating precise recovery of the dormant knowledge.
  • Figure 5: A Qualitative Comparison for the Restoration of Erased Concepts. UnlearnDiffAtk uses adversarial prompts to trick the model, while the MemoRa strategy focuses on knowledge recovery using the LoRA adapter.
  • ...and 24 more figures