k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text
Abe Bohan Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, Tianxing He
TL;DR
k-SemStamp addresses the vulnerability of paraphrase attacks against semantic watermarks by replacing SemStamp's random-LSH space partitioning with domain-informed $k$-means clustering. It introduces a cluster-margin constraint and domain-specific sentence embeddings to improve robustness and sampling efficiency, while preserving generation quality. Evaluated on RealNews and BookSum with multiple paraphrasers, it achieves higher detection performance (AUC and TP@1%/TP@5%) than SemStamp, KGW, and SIR, and requires fewer samples to accept valid sentences. Domain shifts dampen gains but the method remains more robust than baselines, highlighting the approach's practicality for governance of machine-generated text.
Abstract
Recent watermarked generation algorithms inject detectable signatures during language generation to facilitate post-hoc detection. While token-level watermarks are vulnerable to paraphrase attacks, SemStamp (Hou et al., 2023) applies watermark on the semantic representation of sentences and demonstrates promising robustness. SemStamp employs locality-sensitive hashing (LSH) to partition the semantic space with arbitrary hyperplanes, which results in a suboptimal tradeoff between robustness and speed. We propose k-SemStamp, a simple yet effective enhancement of SemStamp, utilizing k-means clustering as an alternative of LSH to partition the embedding space with awareness of inherent semantic structure. Experimental results indicate that k-SemStamp saliently improves its robustness and sampling efficiency while preserving the generation quality, advancing a more effective tool for machine-generated text detection.
