CoheMark: A Novel Sentence-Level Watermark for Enhanced Text Quality
Junyan Zhang, Shuliang Liu, Aiwei Liu, Yubo Gao, Jungang Li, Xiaojie Gu, Xuming Hu
TL;DR
CoheMark introduces a cohesion-aware sentence-level watermarking framework for LLM-generated text, combining Embedder, FuzzyClusterer, and CoheSampler to exploit cross-sentence cohesion. By using fuzzy c-means clustering to create soft semantic spaces and two versions of Next Sentence Selection Criteria with a Switching Rule, it ensures green semantic regions guide next-sentence generation, preserving semantic fluency while enabling robust watermark detection. Comprehensive experiments across six baselines, two base LLMs, and two datasets demonstrate strong watermark detectability with minimal text-quality impact, and the approach shows resilience to paraphrase attacks when evaluated with large-language-model judges. The study highlights that leveraging intra-text coherence yields practical advantages for watermarking in real-world text-generation scenarios, offering a viable path toward trustworthy, traceable AI-generated content.
Abstract
Watermarking technology is a method used to trace the usage of content generated by large language models. Sentence-level watermarking aids in preserving the semantic integrity within individual sentences while maintaining greater robustness. However, many existing sentence-level watermarking techniques depend on arbitrary segmentation or generation processes to embed watermarks, which can limit the availability of appropriate sentences. This limitation, in turn, compromises the quality of the generated response. To address the challenge of balancing high text quality with robust watermark detection, we propose CoheMark, an advanced sentence-level watermarking technique that exploits the cohesive relationships between sentences for better logical fluency. The core methodology of CoheMark involves selecting sentences through trained fuzzy c-means clustering and applying specific next sentence selection criteria. Experimental evaluations demonstrate that CoheMark achieves strong watermark strength while exerting minimal impact on text quality.
