Table of Contents
Fetching ...

Building Intelligence Identification System via Large Language Model Watermarking: A Survey and Beyond

Xuhong Wang, Haoyu Jiang, Yi Yu, Jingru Yu, Yilun Lin, Ping Yi, Yingchun Wang, Yu Qiao, Li Li, Fei-Yue Wang

TL;DR

This work surveys watermarked LLMs as a foundation for robust identity recognition in a growing AI ecosystem. It introduces an information-theoretic framework that formalizes watermarking through the objective $I(S^N;W) - \\lambda I(m;W)$ with constraints, and classifies generation, embedding, extraction, and reconstruction methods across data-level and model-level implementations. The paper provides a comprehensive taxonomy of watermarking techniques (vocabulary-partitioning, model-learning, and custom-rules), a wide-ranging evaluation schema (including success rate, confidence, complexity, text quality, transparency, information density, robustness, unforgeability, cross-lingual consistency, and radioactivity), and strategic guidance for multi-stakeholder governance. By detailing attacker models and practical deployment considerations, it outlines how rich information watermarks, cryptographic verification, and integrated identification technologies can enable traceability, accountability, and trust in LLM-enabled applications. Together, these contributions aim to advance secure, transparent, and equitable LLM ecosystems suitable for real-world governance and protection of intellectual property.

Abstract

Large Language Models (LLMs) are increasingly integrated into diverse industries, posing substantial security risks due to unauthorized replication and misuse. To mitigate these concerns, robust identification mechanisms are widely acknowledged as an effective strategy. Identification systems for LLMs now rely heavily on watermarking technology to manage and protect intellectual property and ensure data security. However, previous studies have primarily concentrated on the basic principles of algorithms and lacked a comprehensive analysis of watermarking theory and practice from the perspective of intelligent identification. To bridge this gap, firstly, we explore how a robust identity recognition system can be effectively implemented and managed within LLMs by various participants using watermarking technology. Secondly, we propose a mathematical framework based on mutual information theory, which systematizes the identification process to achieve more precise and customized watermarking. Additionally, we present a comprehensive evaluation of performance metrics for LLM watermarking, reflecting participant preferences and advancing discussions on its identification applications. Lastly, we outline the existing challenges in current watermarking technologies and theoretical frameworks, and provide directional guidance to address these challenges. Our systematic classification and detailed exposition aim to enhance the comparison and evaluation of various methods, fostering further research and development toward a transparent, secure, and equitable LLM ecosystem.

Building Intelligence Identification System via Large Language Model Watermarking: A Survey and Beyond

TL;DR

This work surveys watermarked LLMs as a foundation for robust identity recognition in a growing AI ecosystem. It introduces an information-theoretic framework that formalizes watermarking through the objective with constraints, and classifies generation, embedding, extraction, and reconstruction methods across data-level and model-level implementations. The paper provides a comprehensive taxonomy of watermarking techniques (vocabulary-partitioning, model-learning, and custom-rules), a wide-ranging evaluation schema (including success rate, confidence, complexity, text quality, transparency, information density, robustness, unforgeability, cross-lingual consistency, and radioactivity), and strategic guidance for multi-stakeholder governance. By detailing attacker models and practical deployment considerations, it outlines how rich information watermarks, cryptographic verification, and integrated identification technologies can enable traceability, accountability, and trust in LLM-enabled applications. Together, these contributions aim to advance secure, transparent, and equitable LLM ecosystems suitable for real-world governance and protection of intellectual property.

Abstract

Large Language Models (LLMs) are increasingly integrated into diverse industries, posing substantial security risks due to unauthorized replication and misuse. To mitigate these concerns, robust identification mechanisms are widely acknowledged as an effective strategy. Identification systems for LLMs now rely heavily on watermarking technology to manage and protect intellectual property and ensure data security. However, previous studies have primarily concentrated on the basic principles of algorithms and lacked a comprehensive analysis of watermarking theory and practice from the perspective of intelligent identification. To bridge this gap, firstly, we explore how a robust identity recognition system can be effectively implemented and managed within LLMs by various participants using watermarking technology. Secondly, we propose a mathematical framework based on mutual information theory, which systematizes the identification process to achieve more precise and customized watermarking. Additionally, we present a comprehensive evaluation of performance metrics for LLM watermarking, reflecting participant preferences and advancing discussions on its identification applications. Lastly, we outline the existing challenges in current watermarking technologies and theoretical frameworks, and provide directional guidance to address these challenges. Our systematic classification and detailed exposition aim to enhance the comparison and evaluation of various methods, fostering further research and development toward a transparent, secure, and equitable LLM ecosystem.
Paper Structure (48 sections, 18 equations, 7 figures, 3 tables)

This paper contains 48 sections, 18 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Evolution of application systems for LLMs: transitioning from a centralized system focused on model technology service providers to a multi-centric system emphasizing identity verification and behavior traceability.
  • Figure 2: The watermarking technology framework in LLMs. The watermark message $m$ is used to identify the specific LLM. The security key $K^D$ represents the privacy identity tag used to generate and reconstruct the watermark. The watermark attack channels are designed to simulate attacks such as semantic substitutions and sequence changes that watermarked texts encounter during transmission.
  • Figure 3: The overview of watermark algorithms in LLMs.
  • Figure 4: Watermark generation through vocabulary partitioning. Utilizing a hash function, the previous token is used as input to compute a random seed, which divides the vocabulary into green and red lists. The LLM-generated token bias is applied by adding a bias term to the token log probabilities, favoring the green list.
  • Figure 5: Watermark embedding through modifying logits generation.
  • ...and 2 more figures