Table of Contents
Fetching ...

Watermarking across Modalities for Content Tracing and Generative AI

Pierre Fernandez

TL;DR

This work surveys watermarking across images, audio, and text, detailing active and generation-time strategies to trace content and monitor model usage in the GenAI era. It develops SSL-based latent-space watermarking for images, active indexing to bolster retrieval, latent-diffusion-rooted watermarks for model-origin tracing, and proactive localized watermarks for voice cloning. It also introduces robust statistical tests and detection schemes for LLM watermarking, and extends to model radioactivity, enabling detection of watermark traces in downstream fine-tuned models even with partial data access. Collectively, the thesis provides practical, scalable methods for content provenance, model monitoring, and policy-compliant content generation, while addressing limitations and adversarial scenarios. The results demonstrate robust detection, localization, and attribution across modalities, with significant implications for content moderation, IP protection, and governance in AI technologies.

Abstract

Watermarking embeds information into digital content like images, audio, or text, imperceptible to humans but robustly detectable by specific algorithms. This technology has important applications in many challenges of the industry such as content moderation, tracing AI-generated content, and monitoring the usage of AI models. The contributions of this thesis include the development of new watermarking techniques for images, audio, and text. We first introduce methods for active moderation of images on social platforms. We then develop specific techniques for AI-generated content. We specifically demonstrate methods to adapt latent generative models to embed watermarks in all generated content, identify watermarked sections in speech, and improve watermarking in large language models with tests that ensure low false positive rates. Furthermore, we explore the use of digital watermarking to detect model misuse, including the detection of watermarks in language models fine-tuned on watermarked text, and introduce training-free watermarks for the weights of large transformers. Through these contributions, the thesis provides effective solutions for the challenges posed by the increasing use of generative AI models and the need for model monitoring and content moderation. It finally examines the challenges and limitations of watermarking techniques and discuss potential future directions for research in this area.

Watermarking across Modalities for Content Tracing and Generative AI

TL;DR

This work surveys watermarking across images, audio, and text, detailing active and generation-time strategies to trace content and monitor model usage in the GenAI era. It develops SSL-based latent-space watermarking for images, active indexing to bolster retrieval, latent-diffusion-rooted watermarks for model-origin tracing, and proactive localized watermarks for voice cloning. It also introduces robust statistical tests and detection schemes for LLM watermarking, and extends to model radioactivity, enabling detection of watermark traces in downstream fine-tuned models even with partial data access. Collectively, the thesis provides practical, scalable methods for content provenance, model monitoring, and policy-compliant content generation, while addressing limitations and adversarial scenarios. The results demonstrate robust detection, localization, and attribution across modalities, with significant implications for content moderation, IP protection, and governance in AI technologies.

Abstract

Watermarking embeds information into digital content like images, audio, or text, imperceptible to humans but robustly detectable by specific algorithms. This technology has important applications in many challenges of the industry such as content moderation, tracing AI-generated content, and monitoring the usage of AI models. The contributions of this thesis include the development of new watermarking techniques for images, audio, and text. We first introduce methods for active moderation of images on social platforms. We then develop specific techniques for AI-generated content. We specifically demonstrate methods to adapt latent generative models to embed watermarks in all generated content, identify watermarked sections in speech, and improve watermarking in large language models with tests that ensure low false positive rates. Furthermore, we explore the use of digital watermarking to detect model misuse, including the detection of watermarks in language models fine-tuned on watermarked text, and introduce training-free watermarks for the weights of large transformers. Through these contributions, the thesis provides effective solutions for the challenges posed by the increasing use of generative AI models and the need for model monitoring and content moderation. It finally examines the challenges and limitations of watermarking techniques and discuss potential future directions for research in this area.

Paper Structure

This paper contains 246 sections, 74 equations, 68 figures, 44 tables, 4 algorithms.

Figures (68)

  • Figure 1: Pieces of paper with imprinted watermarks that appear when light is shining from behind because the paper is thinner in the places where a wire was. Now used by historians to date the paper endersby2023patterns.
  • Figure 2: A dandy roll machine with a watermark used to create paper. https://commons.wikimedia.org/wiki/File:Dandy_roll_2.jpg.
  • Figure 3: Annual number of papers on watermarking, from IEEE Xplore and arXiv (number for 2024 is extrapolated). We can identify the rise of watermarking in the 1990s -- with the popularization of DVDs -- its stabilization and slight decrease in the 2010s, and its resurgence in the 2020s -- which can be attributed to the popularization of generative models, e.g., image models karras2020analyzingho2020denoising and LLMs like GPT-3 brown2020language.
  • Figure 4: (Left) A viral image of Pope Francis generated from text by Midjourney, which became famous as deepfake example. (Right) Frames from video clips generated by OpenAI's Sora. The fact that AI-generated content is visually indistinguishable from human-generated content highlights the need for more robust detection schemes like watermarking.
  • Figure 5: Example of a visible watermark, and how it can be removed in less than 10 seconds using free software (Adobe Firefly in this example).
  • ...and 63 more figures

Theorems & Definitions (2)

  • Definition 1: Text Radioactivity
  • Definition 2: Model Radioactivity