Table of Contents
Fetching ...

Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable

Haozhe Liu, Wentian Zhang, Bing Li, Bernard Ghanem, Jürgen Schmidhuber

TL;DR

The paper tackles the problem of protecting diffusion-model ownership against downstream fine-tuning by weakening the typical vulnerability of backdoor watermarks. It introduces Arbitrary-In-Arbitrary-Out (AIAO), which embeds mask-controlled, feature-space triggers across randomly sampled subpaths in lazy layers, reducing dependence on busy layers likely to change during fine-tuning. By coupling a three-term training objective and Monte Carlo verification over subpaths, the method achieves robust ownership verification with minimal impact on image quality across text-to-image and unconditional diffusion models, outperforming prior watermarking approaches. This approach has practical implications for accountability and safety regulation in generative AI, enabling reliable provenance verification even after model adaptation.

Abstract

Foundational generative models should be traceable to protect their owners and facilitate safety regulation. To achieve this, traditional approaches embed identifiers based on supervisory trigger-response signals, which are commonly known as backdoor watermarks. They are prone to failure when the model is fine-tuned with nontrigger data. Our experiments show that this vulnerability is due to energetic changes in only a few 'busy' layers during fine-tuning. This yields a novel arbitrary-in-arbitrary-out (AIAO) strategy that makes watermarks resilient to fine-tuning-based removal. The trigger-response pairs of AIAO samples across various neural network depths can be used to construct watermarked subpaths, employing Monte Carlo sampling to achieve stable verification results. In addition, unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths, where a mask-controlled trigger function is proposed to preserve the generation performance and ensure the invisibility of the embedded backdoor. Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO; while the verification rates of other trigger-based methods fall from ~90% to ~70% after fine-tuning, those of our method remain consistently above 90%.

Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable

TL;DR

The paper tackles the problem of protecting diffusion-model ownership against downstream fine-tuning by weakening the typical vulnerability of backdoor watermarks. It introduces Arbitrary-In-Arbitrary-Out (AIAO), which embeds mask-controlled, feature-space triggers across randomly sampled subpaths in lazy layers, reducing dependence on busy layers likely to change during fine-tuning. By coupling a three-term training objective and Monte Carlo verification over subpaths, the method achieves robust ownership verification with minimal impact on image quality across text-to-image and unconditional diffusion models, outperforming prior watermarking approaches. This approach has practical implications for accountability and safety regulation in generative AI, enabling reliable provenance verification even after model adaptation.

Abstract

Foundational generative models should be traceable to protect their owners and facilitate safety regulation. To achieve this, traditional approaches embed identifiers based on supervisory trigger-response signals, which are commonly known as backdoor watermarks. They are prone to failure when the model is fine-tuned with nontrigger data. Our experiments show that this vulnerability is due to energetic changes in only a few 'busy' layers during fine-tuning. This yields a novel arbitrary-in-arbitrary-out (AIAO) strategy that makes watermarks resilient to fine-tuning-based removal. The trigger-response pairs of AIAO samples across various neural network depths can be used to construct watermarked subpaths, employing Monte Carlo sampling to achieve stable verification results. In addition, unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths, where a mask-controlled trigger function is proposed to preserve the generation performance and ensure the invisibility of the embedded backdoor. Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO; while the verification rates of other trigger-based methods fall from ~90% to ~70% after fine-tuning, those of our method remain consistently above 90%.
Paper Structure (28 sections, 18 equations, 15 figures, 17 tables)

This paper contains 28 sections, 18 equations, 15 figures, 17 tables.

Figures (15)

  • Figure 1: Illustration of our motivation.(a) We embed a backdoor-based watermark peng2023protecting in the source diffusion model (DM) rombach2022high (left), where a watermark is activated and insert "DIFF" in the generated images given a trigger input. (b) We fine-tune the source backdoored DM on a downstream dataset AFHQ-Cat choi2020stargan. The first row shows the results obtained for a normal input at every 100 training steps, and the second row shows the results for a trigger input. However, the embedded watermark inherited from the source DM is gradually erased with increasing fine-tuning steps, posing a challenge to the ownership protection of the source DM. Instead, we propose a robust backdoor against downstream fine-tuning for the traceable ownership protection of DMs.
  • Figure 2: Ability of a generative model to learn new knowledge can be concentrated in a few critical layers. In our pilot study, we fine-tune a pre-trained Stable Diffusion model to generate dog images when provided "A Cat" as the input signal. This mapping relationship has never been part of the model's regular training data, making it a novel source of knowledge for the generative model. We tracked the changes caused by learning this knowledge and observed that the density of parameter changes is nearly zero, indicating that the majority of model layers were lazy to update.
  • Figure 3: Impacts of busy/lazy layers on generative performance. The source DM is a cat-generation model and is fine-tuned on the AFHQ-Dog dataset choi2020stargan to obtain the target DM. By replacing the parameter values of the top 50 busy layers in the source model with their corresponding values in the target model, the model can achieve performance comparable to the target model. In other words, most effects in generative performance originate from a few layers.
  • Figure 4: Pipeline of the proposed method. Our method randomly selects two layers for trigger embedding and response activation. The trigger function maps the selected elements of the feature to pre-defined signs. Note that we only changed the signs rather than the absolute values.
  • Figure 5: Verification success rate (VS-rates) and response success rate (RS-rate) of different methods in protecting source DMs before fine-tuning on downstream generation task. The base model is set to the text-to-image DM rombach2022high.
  • ...and 10 more figures