VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models
Xiang Li, Qianli Shen, Kenji Kawaguchi
TL;DR
The paper tackles the challenge that probabilistic copyright protection for text-to-image diffusion models may be undermined by persistent, adversarial prompting. It introduces Virtually Assured Amplification Attack (VA3), an online framework where an adversary iteratively selects prompts to amplify infringement probability, with Theorem 1 guaranteeing near-certain success given enough interactions and a positive per-step success bound $\sigma$. A practical Anti-NAF procedure combines adversarial prompt optimization with a loss-balanced objective to defeat CP-\u007fk protections, and the authors implement a bandit-based online prompt-selection strategy to manage exploration and exploitation. Empirical results on Pokemon and LAION-mi datasets show that amplification substantially raises infringement rates under CP-\u007fk, while Anti-NAF notably increases infringing outputs; ablations highlight the importance of balanced objectives and prompt design. The findings stress the need for more robust copyright protections and motivate future work on broader attack models, transferability, and stronger defenses in real-world, black-box settings.
Abstract
The booming use of text-to-image generative models has raised concerns about their high risk of producing copyright-infringing content. While probabilistic copyright protection methods provide a probabilistic guarantee against such infringement, in this paper, we introduce Virtually Assured Amplification Attack (VA3), a novel online attack framework that exposes the vulnerabilities of these protection mechanisms. The proposed framework significantly amplifies the probability of generating infringing content on the sustained interactions with generative models and a non-trivial lower-bound on the success probability of each engagement. Our theoretical and experimental results demonstrate the effectiveness of our approach under various scenarios. These findings highlight the potential risk of implementing probabilistic copyright protection in practical applications of text-to-image generative models. Code is available at https://github.com/South7X/VA3.
