RoboSignature: Robust Signature and Watermarking on Network Attacks

Aryaman Shaan; Garvit Banga; Raghav Mantri

RoboSignature: Robust Signature and Watermarking on Network Attacks

Aryaman Shaan, Garvit Banga, Raghav Mantri

TL;DR

This work investigates watermarking robustness in Latent Diffusion Models and identifies vulnerability to network-level adversarial fine-tuning. It introduces Random Key Attack and Gradual Random Key Attack to exploit the $2^{48}$-bit watermark key space and demonstrate how watermarks can be corrupted or misembedded. To counter these threats, the authors adapt Tamper Resistant Fine-Tuning (TAR) from LLMs to LDMs, enabling a defense that preserves image quality while maintaining watermark integrity to a degree. Experimental results show that modified TAR achieves high PSNR and watermark accuracy on evaluation data, but aggressive attacks still reduce bit-traceability, underscoring the need for stronger, principled defenses for watermarking in diffusion models. The findings highlight practical implications for securely open-sourcing watermarked generative systems and guide future research on tamper-resistant defenses for diffusion-based content generation.

Abstract

Generative models have enabled easy creation and generation of images of all kinds given a single prompt. However, this has also raised ethical concerns about what is an actual piece of content created by humans or cameras compared to model-generated content like images or videos. Watermarking data generated by modern generative models is a popular method to provide information on the source of the content. The goal is for all generated images to conceal an invisible watermark, allowing for future detection or identification. The Stable Signature finetunes the decoder of Latent Diffusion Models such that a unique watermark is rooted in any image produced by the decoder. In this paper, we present a novel adversarial fine-tuning attack that disrupts the model's ability to embed the intended watermark, exposing a significant vulnerability in existing watermarking methods. To address this, we further propose a tamper-resistant fine-tuning algorithm inspired by methods developed for large language models, tailored to the specific requirements of watermarking in LDMs. Our findings emphasize the importance of anticipating and defending against potential vulnerabilities in generative systems.

RoboSignature: Robust Signature and Watermarking on Network Attacks

TL;DR

-bit watermark key space and demonstrate how watermarks can be corrupted or misembedded. To counter these threats, the authors adapt Tamper Resistant Fine-Tuning (TAR) from LLMs to LDMs, enabling a defense that preserves image quality while maintaining watermark integrity to a degree. Experimental results show that modified TAR achieves high PSNR and watermark accuracy on evaluation data, but aggressive attacks still reduce bit-traceability, underscoring the need for stronger, principled defenses for watermarking in diffusion models. The findings highlight practical implications for securely open-sourcing watermarked generative systems and guide future research on tamper-resistant defenses for diffusion-based content generation.

Abstract

Paper Structure (17 sections, 2 equations, 1 figure, 3 tables, 1 algorithm)

This paper contains 17 sections, 2 equations, 1 figure, 3 tables, 1 algorithm.

Introduction
Related Work
Image Generation
Watermarking
HiDDeN
Stable Signature
Network Level Attacks
Tamper Resistant Fine-Tuning in LLMs (TAR)
Methods
Proposed Attacks
Modified TAR Fine-Tuning
Experimental Results
Results of Proposed Attacks
Results of Modified TAR fine-tuning
Conclusion
...and 2 more sections

Figures (1)

Figure 1: Bit Accuracy vs PSNR

RoboSignature: Robust Signature and Watermarking on Network Attacks

TL;DR

Abstract

RoboSignature: Robust Signature and Watermarking on Network Attacks

Authors

TL;DR

Abstract

Table of Contents

Figures (1)