LLM Fingerprinting via Semantically Conditioned Watermarks
Thibaud Gloaguen, Robin Staab, Nikola Jovanović, Martin Vechev
TL;DR
This paper tackles the problem of proving ownership of open-weight LLMs by moving from brittle, fixed query-key fingerprints to a robust, stealthy paradigm based on semantically conditioned watermarks. By selecting a high-entropy semantic domain (e.g., French) and diffusing a statistical watermark signal across each response, the method enables reliable fingerprint detection even after deployment changes such as finetuning, quantization, or pruning. The authors implement watermark distillation within the semantic domain and preserve non-domain behavior with a regularization term, achieving strong detection while maintaining model utility; detection scales with the number of concatenated responses, and extensive evaluations show robustness against 25 deployment scenarios and 5 targeted adversaries. The work provides a practical, provable approach to model provenance with broad implications for licensing, accountability, and reproducibility in LLM deployment, while acknowledging domain-design tradeoffs and potential misuse concerns.
Abstract
Most LLM fingerprinting methods teach the model to respond to a few fixed queries with predefined atypical responses (keys). This memorization often does not survive common deployment steps such as finetuning or quantization, and such keys can be easily detected and filtered from LLM responses, ultimately breaking the fingerprint. To overcome these limitations we introduce LLM fingerprinting via semantically conditioned watermarks, replacing fixed query sets with a broad semantic domain, and replacing brittle atypical keys with a statistical watermarking signal diffused throughout each response. After teaching the model to watermark its responses only to prompts from a predetermined domain e.g., French language, the model owner can use queries from that domain to reliably detect the fingerprint and verify ownership. As we confirm in our thorough experimental evaluation, our fingerprint is both stealthy and robust to all common deployment scenarios.
