Table of Contents
Fetching ...

CEFW: A Comprehensive Evaluation Framework for Watermark in Large Language Models

Shuhao Zhang, Bo Cheng, Jiale Han, Yuli Chen, Zhixuan Wu, Changbao Li, Pingli Gu

TL;DR

This work introduces the Comprehensive Evaluation Framework for Watermark in Large Language Models (CEFW), a unified scheme that assesses watermark methods along five axes: detectability, text quality, usability, robustness, and imperceptibility, and computes a comprehensive score via $S_{CEFW} = \frac{1}{6}S_{D} + \frac{1}{6}S_{T} + \frac{1}{6}S_{U} + \frac{1}{4}S_{R} + \frac{1}{4}S_{I}$. It also presents Balanced Watermark (BW), a watermark design that balances dynamic and static partitioning to enhance performance across criteria. In experiments on C4 and Quora-QA with OPT-2.7b and Llama3-8b, BW achieves the highest comprehensive scores and demonstrates superior text quality and robustness relative to KGW and UNIW. The authors provide open-source code to enable flexible benchmarking and adaptation by LLM providers and researchers. This framework and BW offer practical, evaluable benchmarks for deploying watermarking in real-world LLM services.

Abstract

Text watermarking provides an effective solution for identifying synthetic text generated by large language models. However, existing techniques often focus on satisfying specific criteria while ignoring other key aspects, lacking a unified evaluation. To fill this gap, we propose the Comprehensive Evaluation Framework for Watermark (CEFW), a unified framework that comprehensively evaluates watermarking methods across five key dimensions: ease of detection, fidelity of text quality, minimal embedding cost, robustness to adversarial attacks, and imperceptibility to prevent imitation or forgery. By assessing watermarks according to all these key criteria, CEFW offers a thorough evaluation of their practicality and effectiveness. Moreover, we introduce a simple and effective watermarking method called Balanced Watermark (BW), which guarantees robustness and imperceptibility through balancing the way watermark information is added. Extensive experiments show that BW outperforms existing methods in overall performance across all evaluation dimensions. We release our code to the community for future research. https://github.com/DrankXs/BalancedWatermark.

CEFW: A Comprehensive Evaluation Framework for Watermark in Large Language Models

TL;DR

This work introduces the Comprehensive Evaluation Framework for Watermark in Large Language Models (CEFW), a unified scheme that assesses watermark methods along five axes: detectability, text quality, usability, robustness, and imperceptibility, and computes a comprehensive score via . It also presents Balanced Watermark (BW), a watermark design that balances dynamic and static partitioning to enhance performance across criteria. In experiments on C4 and Quora-QA with OPT-2.7b and Llama3-8b, BW achieves the highest comprehensive scores and demonstrates superior text quality and robustness relative to KGW and UNIW. The authors provide open-source code to enable flexible benchmarking and adaptation by LLM providers and researchers. This framework and BW offer practical, evaluable benchmarks for deploying watermarking in real-world LLM services.

Abstract

Text watermarking provides an effective solution for identifying synthetic text generated by large language models. However, existing techniques often focus on satisfying specific criteria while ignoring other key aspects, lacking a unified evaluation. To fill this gap, we propose the Comprehensive Evaluation Framework for Watermark (CEFW), a unified framework that comprehensively evaluates watermarking methods across five key dimensions: ease of detection, fidelity of text quality, minimal embedding cost, robustness to adversarial attacks, and imperceptibility to prevent imitation or forgery. By assessing watermarks according to all these key criteria, CEFW offers a thorough evaluation of their practicality and effectiveness. Moreover, we introduce a simple and effective watermarking method called Balanced Watermark (BW), which guarantees robustness and imperceptibility through balancing the way watermark information is added. Extensive experiments show that BW outperforms existing methods in overall performance across all evaluation dimensions. We release our code to the community for future research. https://github.com/DrankXs/BalancedWatermark.

Paper Structure

This paper contains 43 sections, 11 equations, 4 figures, 7 tables, 3 algorithms.

Figures (4)

  • Figure 1: Watermarks have three distinct audiences: LLM providers, LLM users, and detection service users. They necessitate watermarks to have detectability, text quality, and usability in terms of application. Concurrently, LLM providers must ensure the imperceptibility and robustness of watermarks in terms of security to counteract scrubbing attacks and spoofing attacks by malicious attackers.
  • Figure 2: Difference of partition functions in KGW, UNIW, and BW. Preparing Stage is before entering the prompt to LLM. Generating Stage is at when LLM generates a response.
  • Figure 3: Watermark Complexity Analysis.
  • Figure 4: An overview of CEFW. The upper area delimited by dotted lines is traditional watermark evaluation process, which is used by current watermark study. The bottom area shows the work of CEFW. CEFW combs the current watermark metrics to give five necessary characteristics score. Subsequently, CEFW introduces the Demand Weighted from LLM service providers to get a watermark comprehensive score.