Watermarking across Modalities for Content Tracing and Generative AI
Pierre Fernandez
TL;DR
This work surveys watermarking across images, audio, and text, detailing active and generation-time strategies to trace content and monitor model usage in the GenAI era. It develops SSL-based latent-space watermarking for images, active indexing to bolster retrieval, latent-diffusion-rooted watermarks for model-origin tracing, and proactive localized watermarks for voice cloning. It also introduces robust statistical tests and detection schemes for LLM watermarking, and extends to model radioactivity, enabling detection of watermark traces in downstream fine-tuned models even with partial data access. Collectively, the thesis provides practical, scalable methods for content provenance, model monitoring, and policy-compliant content generation, while addressing limitations and adversarial scenarios. The results demonstrate robust detection, localization, and attribution across modalities, with significant implications for content moderation, IP protection, and governance in AI technologies.
Abstract
Watermarking embeds information into digital content like images, audio, or text, imperceptible to humans but robustly detectable by specific algorithms. This technology has important applications in many challenges of the industry such as content moderation, tracing AI-generated content, and monitoring the usage of AI models. The contributions of this thesis include the development of new watermarking techniques for images, audio, and text. We first introduce methods for active moderation of images on social platforms. We then develop specific techniques for AI-generated content. We specifically demonstrate methods to adapt latent generative models to embed watermarks in all generated content, identify watermarked sections in speech, and improve watermarking in large language models with tests that ensure low false positive rates. Furthermore, we explore the use of digital watermarking to detect model misuse, including the detection of watermarks in language models fine-tuned on watermarked text, and introduce training-free watermarks for the weights of large transformers. Through these contributions, the thesis provides effective solutions for the challenges posed by the increasing use of generative AI models and the need for model monitoring and content moderation. It finally examines the challenges and limitations of watermarking techniques and discuss potential future directions for research in this area.
