Table of Contents
Fetching ...

Evaluating Dataset Watermarking for Fine-tuning Traceability of Customized Diffusion Models: A Comprehensive Benchmark and Removal Approach

Xincheng Wang, Hanchi Sun, Wenjun Sun, Kejun Xue, Wangqiu Zhou, Jianbo Zhang, Wei Sun, Dandan Zhu, Xiongkuo Min, Jun Jia, Zhijun Fang

TL;DR

This paper formalizes a universal threat model and a three-dimensional evaluation framework (Universality, Transmissibility, Robustness) to benchmark dataset watermarking methods used for tracing diffusion-model fine-tuning. Through a comprehensive benchmark, it shows existing methods vary in cross-task applicability and traceability, with some robustness to common distortions but vulnerability to targeted removal. To expose these weaknesses, the authors introduce DeAttack, a unified watermark-removal framework that combines multi-domain degradation with high-quality restoration to erase watermark signals while preserving perceptual quality. The findings highlight a critical gap in current designs and emphasize the need for adversary-aware, robust watermarking techniques for diffusion-model traceability and copyright protection.

Abstract

Recent fine-tuning techniques for diffusion models enable them to reproduce specific image sets, such as particular faces or artistic styles, but also introduce copyright and security risks. Dataset watermarking has been proposed to ensure traceability by embedding imperceptible watermarks into training images, which remain detectable in outputs even after fine-tuning. However, current methods lack a unified evaluation framework. To address this, this paper establishes a general threat model and introduces a comprehensive evaluation framework encompassing Universality, Transmissibility, and Robustness. Experiments show that existing methods perform well in universality and transmissibility, and exhibit some robustness against common image processing operations, yet still fall short under real-world threat scenarios. To reveal these vulnerabilities, the paper further proposes a practical watermark removal method that fully eliminates dataset watermarks without affecting fine-tuning, highlighting a key challenge for future research.

Evaluating Dataset Watermarking for Fine-tuning Traceability of Customized Diffusion Models: A Comprehensive Benchmark and Removal Approach

TL;DR

This paper formalizes a universal threat model and a three-dimensional evaluation framework (Universality, Transmissibility, Robustness) to benchmark dataset watermarking methods used for tracing diffusion-model fine-tuning. Through a comprehensive benchmark, it shows existing methods vary in cross-task applicability and traceability, with some robustness to common distortions but vulnerability to targeted removal. To expose these weaknesses, the authors introduce DeAttack, a unified watermark-removal framework that combines multi-domain degradation with high-quality restoration to erase watermark signals while preserving perceptual quality. The findings highlight a critical gap in current designs and emphasize the need for adversary-aware, robust watermarking techniques for diffusion-model traceability and copyright protection.

Abstract

Recent fine-tuning techniques for diffusion models enable them to reproduce specific image sets, such as particular faces or artistic styles, but also introduce copyright and security risks. Dataset watermarking has been proposed to ensure traceability by embedding imperceptible watermarks into training images, which remain detectable in outputs even after fine-tuning. However, current methods lack a unified evaluation framework. To address this, this paper establishes a general threat model and introduces a comprehensive evaluation framework encompassing Universality, Transmissibility, and Robustness. Experiments show that existing methods perform well in universality and transmissibility, and exhibit some robustness against common image processing operations, yet still fall short under real-world threat scenarios. To reveal these vulnerabilities, the paper further proposes a practical watermark removal method that fully eliminates dataset watermarks without affecting fine-tuning, highlighting a key challenge for future research.

Paper Structure

This paper contains 14 sections, 14 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Visualization results of the four fine-tuning methods on three datasets. The first column shows the result without training the text encoder, the second column shows the result of training the text encoder, and the third column generates the corresponding prompt for each sample.
  • Figure 2: Overview of the threat model. Image Owners embed binary watermarks into datasets to establish ownership and ensure traceability. Upon acquiring the data, Image Users may generate customized images using various fine-tuning or model adaptation techniques. If the original watermark is successfully detected in the generated images, the protection mechanism is deemed effective; otherwise, it is considered to have failed.
  • Figure 3: The visualization of generation results after applying natural distortion. The figure indicates the optimal FID score and CLIP-T similarity for each fine-tuning approach.
  • Figure 4: The architecture of DeAttack. A unified framework for watermark removal, utilizing image degradation and restoration processes.