Table of Contents
Fetching ...

A Survey of Text Watermarking in the Era of Large Language Models

Aiwei Liu, Leyi Pan, Yijian Lu, Jingjing Li, Xuming Hu, Xi Zhang, Lijie Wen, Irwin King, Hui Xiong, Philip S. Yu

TL;DR

This survey analyzes text watermarking in the era of large language models, detailing techniques for existing text and LLM-generated text, evaluation metrics, and real-world applications. It presents a comprehensive taxonomy spanning format/lexical/syntactic/generation-based methods, and logits/token-sampling/inference-time strategies for LLM watermarking, along with training-time approaches. The paper surveys detectability, quality impact, robustness to untargeted and targeted attacks, and benchmarks/tools, highlighting trade-offs and future directions such as public verifiability, open-source robustness, and low-burden deployment. By consolidating methods, metrics, and benchmarks, it provides a foundation for developing robust, scalable watermarking solutions that protect copyright and support AI-generated content detection in practical contexts.

Abstract

Text watermarking algorithms are crucial for protecting the copyright of textual content. Historically, their capabilities and application scenarios were limited. However, recent advancements in large language models (LLMs) have revolutionized these techniques. LLMs not only enhance text watermarking algorithms with their advanced abilities but also create a need for employing these algorithms to protect their own copyrights or prevent potential misuse. This paper conducts a comprehensive survey of the current state of text watermarking technology, covering four main aspects: (1) an overview and comparison of different text watermarking techniques; (2) evaluation methods for text watermarking algorithms, including their detectability, impact on text or LLM quality, robustness under target or untargeted attacks; (3) potential application scenarios for text watermarking technology; (4) current challenges and future directions for text watermarking. This survey aims to provide researchers with a thorough understanding of text watermarking technology in the era of LLM, thereby promoting its further advancement.

A Survey of Text Watermarking in the Era of Large Language Models

TL;DR

This survey analyzes text watermarking in the era of large language models, detailing techniques for existing text and LLM-generated text, evaluation metrics, and real-world applications. It presents a comprehensive taxonomy spanning format/lexical/syntactic/generation-based methods, and logits/token-sampling/inference-time strategies for LLM watermarking, along with training-time approaches. The paper surveys detectability, quality impact, robustness to untargeted and targeted attacks, and benchmarks/tools, highlighting trade-offs and future directions such as public verifiability, open-source robustness, and low-burden deployment. By consolidating methods, metrics, and benchmarks, it provides a foundation for developing robust, scalable watermarking solutions that protect copyright and support AI-generated content detection in practical contexts.

Abstract

Text watermarking algorithms are crucial for protecting the copyright of textual content. Historically, their capabilities and application scenarios were limited. However, recent advancements in large language models (LLMs) have revolutionized these techniques. LLMs not only enhance text watermarking algorithms with their advanced abilities but also create a need for employing these algorithms to protect their own copyrights or prevent potential misuse. This paper conducts a comprehensive survey of the current state of text watermarking technology, covering four main aspects: (1) an overview and comparison of different text watermarking techniques; (2) evaluation methods for text watermarking algorithms, including their detectability, impact on text or LLM quality, robustness under target or untargeted attacks; (3) potential application scenarios for text watermarking technology; (4) current challenges and future directions for text watermarking. This survey aims to provide researchers with a thorough understanding of text watermarking technology in the era of LLM, thereby promoting its further advancement.
Paper Structure (82 sections, 9 equations, 11 figures, 2 tables)

This paper contains 82 sections, 9 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Relationships between the development of text watermarking techniques and LLMs.
  • Figure 2: This figure offers a overview of text watermarking techniques. It categorizes watermarking into two main types: for Existing Text and for LLMs.
  • Figure 3: Taxonomy of text watermarking methods.
  • Figure 4: Taxonomy of watermarking for existing text.
  • Figure 5: A more illustrative description of the KGW DBLP:conf/icml/KirchenbauerGWK23 algorithm.
  • ...and 6 more figures