Table of Contents
Fetching ...

Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models

Sangwon Jang, June Suk Choi, Jaehyeong Jo, Kimin Lee, Sung Ju Hwang

TL;DR

The paper tackles data poisoning risks in text-to-image diffusion models by introducing the Silent Branding Attack, a trigger-free method that covertly embeds a target logo into training data and causes the logo to appear in generated images without prompts. The authors propose an automated pipeline (logo personalization, mask generation, inpainting with refinement) that uses style-aligned editing and iterative SDEdit to seamlessly fuse logos into diverse imagery while preserving image quality and text alignment. They validate the approach on large-scale high-quality and style-personalization datasets, reporting high Logo Inclusion Rates and early attack success, while demonstrating robust stealthiness against manual and CLIP-based screening. The work highlights ethical concerns, evaluates defenses based on set-based filtering, and discusses how the method could be repurposed for watermarking, emphasizing the need for safeguards in public data-sharing ecosystems. Overall, the study provides a comprehensive pipeline, empirical evidence of a vulnerability, and a basis for developing practical defenses against logo-centric data poisoning in diffusion-based generative systems.

Abstract

Text-to-image diffusion models have achieved remarkable success in generating high-quality contents from text prompts. However, their reliance on publicly available data and the growing trend of data sharing for fine-tuning make these models particularly vulnerable to data poisoning attacks. In this work, we introduce the Silent Branding Attack, a novel data poisoning method that manipulates text-to-image diffusion models to generate images containing specific brand logos or symbols without any text triggers. We find that when certain visual patterns are repeatedly in the training data, the model learns to reproduce them naturally in its outputs, even without prompt mentions. Leveraging this, we develop an automated data poisoning algorithm that unobtrusively injects logos into original images, ensuring they blend naturally and remain undetected. Models trained on this poisoned dataset generate images containing logos without degrading image quality or text alignment. We experimentally validate our silent branding attack across two realistic settings on large-scale high-quality image datasets and style personalization datasets, achieving high success rates even without a specific text trigger. Human evaluation and quantitative metrics including logo detection show that our method can stealthily embed logos.

Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models

TL;DR

The paper tackles data poisoning risks in text-to-image diffusion models by introducing the Silent Branding Attack, a trigger-free method that covertly embeds a target logo into training data and causes the logo to appear in generated images without prompts. The authors propose an automated pipeline (logo personalization, mask generation, inpainting with refinement) that uses style-aligned editing and iterative SDEdit to seamlessly fuse logos into diverse imagery while preserving image quality and text alignment. They validate the approach on large-scale high-quality and style-personalization datasets, reporting high Logo Inclusion Rates and early attack success, while demonstrating robust stealthiness against manual and CLIP-based screening. The work highlights ethical concerns, evaluates defenses based on set-based filtering, and discusses how the method could be repurposed for watermarking, emphasizing the need for safeguards in public data-sharing ecosystems. Overall, the study provides a comprehensive pipeline, empirical evidence of a vulnerability, and a basis for developing practical defenses against logo-centric data poisoning in diffusion-based generative systems.

Abstract

Text-to-image diffusion models have achieved remarkable success in generating high-quality contents from text prompts. However, their reliance on publicly available data and the growing trend of data sharing for fine-tuning make these models particularly vulnerable to data poisoning attacks. In this work, we introduce the Silent Branding Attack, a novel data poisoning method that manipulates text-to-image diffusion models to generate images containing specific brand logos or symbols without any text triggers. We find that when certain visual patterns are repeatedly in the training data, the model learns to reproduce them naturally in its outputs, even without prompt mentions. Leveraging this, we develop an automated data poisoning algorithm that unobtrusively injects logos into original images, ensuring they blend naturally and remain undetected. Models trained on this poisoned dataset generate images containing logos without degrading image quality or text alignment. We experimentally validate our silent branding attack across two realistic settings on large-scale high-quality image datasets and style personalization datasets, achieving high success rates even without a specific text trigger. Human evaluation and quantitative metrics including logo detection show that our method can stealthily embed logos.

Paper Structure

This paper contains 66 sections, 2 equations, 30 figures, 5 tables, 3 algorithms.

Figures (30)

  • Figure 1: Silent branding attack scenario. (Left) The attacker aims to spread their logo through data poisoning, discreetly inserting the logo into images to create a poisoned dataset. (Middle) The poisoned dataset is uploaded to data-sharing communities. (Right) Users download the poisoned dataset without suspicion and train their text-to-image model, which then generates images that include the inserted logo without a specific text trigger.
  • Figure 2: (a) Training images of a toy (poop emoji) placed in various locations and styles, paired with text prompts that do not describe the toy. (b) Generated images from the model trained on images of the toy. Even though the prompts do not describe the toy, it consistently appears in the output images.
  • Figure 3: Overview of our automatic poisoning algorithm consisting of three stages—logo personalization, mask generation, and inpainting & refinement. Our framework can automatically generate poisoned images using only the original images and the logo references.
  • Figure 4: Overview of the main modules in our automatic poisoning algorithm.(a) Logo detection module identifies the target logo when it is known. (b) Logo refinement module enhances the fine details of the detected logo like eyes in the logo.
  • Figure 6: Examples of visualizations from our poisoned dataset.
  • ...and 25 more figures