Table of Contents
Fetching ...

SoK: Are Watermarks in LLMs Ready for Deployment?

Kieu Dang, Phung Lai, NhatHai Phan, Yelong Shen, Ruoming Jin, Abdallah Khreishah, My T. Thai

TL;DR

This SoK formalizes watermarking for LLMs to mitigate model stealing and IP misuse by providing a taxonomy of WM generators and IP checkers, plus a cross-model IP classifier to assess WM effectiveness across diverse LLMs. Through large-scale experiments across multiple LLMs and watermarking schemes, the work shows that WMs can markedly improve IP differentiation but often degrade generation quality and downstream task performance, while also being vulnerable to removal and spoofing attacks. The paper highlights critical gaps between promising research results and real-world deployment, emphasizing the need for robust, scalable, and verifiably-safe watermarking solutions. Overall, the study offers practical insights and a roadmap for developing watermarking techniques that balance IP protection with model utility in real-world LLM ecosystems.

Abstract

Large Language Models (LLMs) have transformed natural language processing, demonstrating impressive capabilities across diverse tasks. However, deploying these models introduces critical risks related to intellectual property violations and potential misuse, particularly as adversaries can imitate these models to steal services or generate misleading outputs. We specifically focus on model stealing attacks, as they are highly relevant to proprietary LLMs and pose a serious threat to their security, revenue, and ethical deployment. While various watermarking techniques have emerged to mitigate these risks, it remains unclear how far the community and industry have progressed in developing and deploying watermarks in LLMs. To bridge this gap, we aim to develop a comprehensive systematization for watermarks in LLMs by 1) presenting a detailed taxonomy for watermarks in LLMs, 2) proposing a novel intellectual property classifier to explore the effectiveness and impacts of watermarks on LLMs under both attack and attack-free environments, 3) analyzing the limitations of existing watermarks in LLMs, and 4) discussing practical challenges and potential future directions for watermarks in LLMs. Through extensive experiments, we show that despite promising research outcomes and significant attention from leading companies and community to deploy watermarks, these techniques have yet to reach their full potential in real-world applications due to their unfavorable impacts on model utility of LLMs and downstream tasks. Our findings provide an insightful understanding of watermarks in LLMs, highlighting the need for practical watermarks solutions tailored to LLM deployment.

SoK: Are Watermarks in LLMs Ready for Deployment?

TL;DR

This SoK formalizes watermarking for LLMs to mitigate model stealing and IP misuse by providing a taxonomy of WM generators and IP checkers, plus a cross-model IP classifier to assess WM effectiveness across diverse LLMs. Through large-scale experiments across multiple LLMs and watermarking schemes, the work shows that WMs can markedly improve IP differentiation but often degrade generation quality and downstream task performance, while also being vulnerable to removal and spoofing attacks. The paper highlights critical gaps between promising research results and real-world deployment, emphasizing the need for robust, scalable, and verifiably-safe watermarking solutions. Overall, the study offers practical insights and a roadmap for developing watermarking techniques that balance IP protection with model utility in real-world LLM ecosystems.

Abstract

Large Language Models (LLMs) have transformed natural language processing, demonstrating impressive capabilities across diverse tasks. However, deploying these models introduces critical risks related to intellectual property violations and potential misuse, particularly as adversaries can imitate these models to steal services or generate misleading outputs. We specifically focus on model stealing attacks, as they are highly relevant to proprietary LLMs and pose a serious threat to their security, revenue, and ethical deployment. While various watermarking techniques have emerged to mitigate these risks, it remains unclear how far the community and industry have progressed in developing and deploying watermarks in LLMs. To bridge this gap, we aim to develop a comprehensive systematization for watermarks in LLMs by 1) presenting a detailed taxonomy for watermarks in LLMs, 2) proposing a novel intellectual property classifier to explore the effectiveness and impacts of watermarks on LLMs under both attack and attack-free environments, 3) analyzing the limitations of existing watermarks in LLMs, and 4) discussing practical challenges and potential future directions for watermarks in LLMs. Through extensive experiments, we show that despite promising research outcomes and significant attention from leading companies and community to deploy watermarks, these techniques have yet to reach their full potential in real-world applications due to their unfavorable impacts on model utility of LLMs and downstream tasks. Our findings provide an insightful understanding of watermarks in LLMs, highlighting the need for practical watermarks solutions tailored to LLM deployment.

Paper Structure

This paper contains 31 sections, 2 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Categories of Large Language Models.
  • Figure 2: Model Stealing Attacks jovanovic2024watermarkdang2025delta.
  • Figure 3: Taxonomy of Watermarking Mechanisms for LLMs.
  • Figure 4: US LLM Market Trend from 2020 to 2030 industry.
  • Figure 5: Native IP checker works well in simple scenarios (left), but struggles to differentiate between outputs from multiple LLMs (center), highlighting the need for a cross-model IP checker (right).
  • ...and 7 more figures