Table of Contents
Fetching ...

TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models

Qianlong Xiang, Miao Zhang, Haoyu Zhang, Kun Wang, Junhui Hou, Liqiang Nie

Abstract

Although text-to-image diffusion models exhibit remarkable generative power, concept erasure techniques are essential for their safe deployment to prevent the creation of harmful content. This has fostered a dynamic interplay between the development of erasure defenses and the adversarial probes designed to bypass them, and this co-evolution has progressively enhanced the efficacy of erasure methods. However, this adversarial co-evolution has converged on a narrow, text-centric paradigm that equates erasure with severing the text-to-image mapping, ignoring that the underlying visual knowledge related to undesired concepts still persist. To substantiate this claim, we investigate from a visual perspective, leveraging DDIM inversion to probe whether a generative pathway for the erased concept can still be found. However, identifying such a visual generative pathway is challenging because standard text-guided DDIM inversion is actively resisted by text-centric defenses within the erased model. To address this, we introduce TINA, a novel Text-free INversion Attack, which enforces this visual-only probe by operating under a null-text condition, thereby avoiding existing text-centric defenses. Moreover, TINA integrates an optimization procedure to overcome the accumulating approximation errors that arise when standard inversion operates without its usual textual guidance. Our experiments demonstrate that TINA regenerates erased concepts from models treated with state-of-the-art unlearning. The success of TINA proves that current methods merely obscure concepts, highlighting an urgent need for paradigms that operate directly on internal visual knowledge.

TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models

Abstract

Although text-to-image diffusion models exhibit remarkable generative power, concept erasure techniques are essential for their safe deployment to prevent the creation of harmful content. This has fostered a dynamic interplay between the development of erasure defenses and the adversarial probes designed to bypass them, and this co-evolution has progressively enhanced the efficacy of erasure methods. However, this adversarial co-evolution has converged on a narrow, text-centric paradigm that equates erasure with severing the text-to-image mapping, ignoring that the underlying visual knowledge related to undesired concepts still persist. To substantiate this claim, we investigate from a visual perspective, leveraging DDIM inversion to probe whether a generative pathway for the erased concept can still be found. However, identifying such a visual generative pathway is challenging because standard text-guided DDIM inversion is actively resisted by text-centric defenses within the erased model. To address this, we introduce TINA, a novel Text-free INversion Attack, which enforces this visual-only probe by operating under a null-text condition, thereby avoiding existing text-centric defenses. Moreover, TINA integrates an optimization procedure to overcome the accumulating approximation errors that arise when standard inversion operates without its usual textual guidance. Our experiments demonstrate that TINA regenerates erased concepts from models treated with state-of-the-art unlearning. The success of TINA proves that current methods merely obscure concepts, highlighting an urgent need for paradigms that operate directly on internal visual knowledge.
Paper Structure (33 sections, 10 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 33 sections, 10 equations, 11 figures, 5 tables, 1 algorithm.

Figures (11)

  • Figure 1: Conceptual overview of text-centric erasure vulnerabilities and our TINA attack. Concept Erasure usually severs the link between a specific text condition and the undesired concept. Previous Attacks remain text-centric, finding adversarial text condition to reactivate the concept. Our TINA bypasses the text pathway entirely. Using an empty text condition, it finds a noise to regenerate the concept, proving the visual knowledge persists in the existing erased models.
  • Figure 2: The TINA (Text-free INversion Attack) framework. (a) Text-Free Inversion Attack: An optimization-based, null-text ($c_{\text{null}}$) inversion finds the unique initial noise $z_T^*$ corresponding to a target image $z_0$. This optimization corrects the errors from standard inversion. (b) Deterministic Concept Regeneration: The same sanitized model $\theta$ uses $z_T^*$ and $c_{\text{null}}$ to deterministically regenerate the target concept $z_0'$, proving the visual knowledge persists despite erasure.
  • Figure 3: Standard DDIM inversion fails to find the generative trajectory. The text-guided path ($S$) is blocked by the erasure, while the null-text path ($S^\emptyset$) drifts due to approximation errors, failing to restore the target concept.
  • Figure 4: Qualitative comparison of attack performance on (a) Nudity Erasure and (b) Style Erasure (Van Gogh). Images with a red border indicate a successful attack. Our TINA (bottom row) consistently regenerates the forbidden concepts, bypassing most of defenses, while text-centric attacks fail against robust methods. Sensitive content is redacted.
  • Figure 5: t-SNE visualization of (a) the optimized initial noises $z_T^*$ and (b) their corresponding deep UNet activations (extracted from the mid_block) for four erased concepts.
  • ...and 6 more figures