CEFW: A Comprehensive Evaluation Framework for Watermark in Large Language Models
Shuhao Zhang, Bo Cheng, Jiale Han, Yuli Chen, Zhixuan Wu, Changbao Li, Pingli Gu
TL;DR
This work introduces the Comprehensive Evaluation Framework for Watermark in Large Language Models (CEFW), a unified scheme that assesses watermark methods along five axes: detectability, text quality, usability, robustness, and imperceptibility, and computes a comprehensive score via $S_{CEFW} = \frac{1}{6}S_{D} + \frac{1}{6}S_{T} + \frac{1}{6}S_{U} + \frac{1}{4}S_{R} + \frac{1}{4}S_{I}$. It also presents Balanced Watermark (BW), a watermark design that balances dynamic and static partitioning to enhance performance across criteria. In experiments on C4 and Quora-QA with OPT-2.7b and Llama3-8b, BW achieves the highest comprehensive scores and demonstrates superior text quality and robustness relative to KGW and UNIW. The authors provide open-source code to enable flexible benchmarking and adaptation by LLM providers and researchers. This framework and BW offer practical, evaluable benchmarks for deploying watermarking in real-world LLM services.
Abstract
Text watermarking provides an effective solution for identifying synthetic text generated by large language models. However, existing techniques often focus on satisfying specific criteria while ignoring other key aspects, lacking a unified evaluation. To fill this gap, we propose the Comprehensive Evaluation Framework for Watermark (CEFW), a unified framework that comprehensively evaluates watermarking methods across five key dimensions: ease of detection, fidelity of text quality, minimal embedding cost, robustness to adversarial attacks, and imperceptibility to prevent imitation or forgery. By assessing watermarks according to all these key criteria, CEFW offers a thorough evaluation of their practicality and effectiveness. Moreover, we introduce a simple and effective watermarking method called Balanced Watermark (BW), which guarantees robustness and imperceptibility through balancing the way watermark information is added. Extensive experiments show that BW outperforms existing methods in overall performance across all evaluation dimensions. We release our code to the community for future research. https://github.com/DrankXs/BalancedWatermark.
