Table of Contents
Fetching ...

k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text

Abe Bohan Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, Tianxing He

TL;DR

k-SemStamp addresses the vulnerability of paraphrase attacks against semantic watermarks by replacing SemStamp's random-LSH space partitioning with domain-informed $k$-means clustering. It introduces a cluster-margin constraint and domain-specific sentence embeddings to improve robustness and sampling efficiency, while preserving generation quality. Evaluated on RealNews and BookSum with multiple paraphrasers, it achieves higher detection performance (AUC and TP@1%/TP@5%) than SemStamp, KGW, and SIR, and requires fewer samples to accept valid sentences. Domain shifts dampen gains but the method remains more robust than baselines, highlighting the approach's practicality for governance of machine-generated text.

Abstract

Recent watermarked generation algorithms inject detectable signatures during language generation to facilitate post-hoc detection. While token-level watermarks are vulnerable to paraphrase attacks, SemStamp (Hou et al., 2023) applies watermark on the semantic representation of sentences and demonstrates promising robustness. SemStamp employs locality-sensitive hashing (LSH) to partition the semantic space with arbitrary hyperplanes, which results in a suboptimal tradeoff between robustness and speed. We propose k-SemStamp, a simple yet effective enhancement of SemStamp, utilizing k-means clustering as an alternative of LSH to partition the embedding space with awareness of inherent semantic structure. Experimental results indicate that k-SemStamp saliently improves its robustness and sampling efficiency while preserving the generation quality, advancing a more effective tool for machine-generated text detection.

k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text

TL;DR

k-SemStamp addresses the vulnerability of paraphrase attacks against semantic watermarks by replacing SemStamp's random-LSH space partitioning with domain-informed -means clustering. It introduces a cluster-margin constraint and domain-specific sentence embeddings to improve robustness and sampling efficiency, while preserving generation quality. Evaluated on RealNews and BookSum with multiple paraphrasers, it achieves higher detection performance (AUC and TP@1%/TP@5%) than SemStamp, KGW, and SIR, and requires fewer samples to accept valid sentences. Domain shifts dampen gains but the method remains more robust than baselines, highlighting the approach's practicality for governance of machine-generated text.

Abstract

Recent watermarked generation algorithms inject detectable signatures during language generation to facilitate post-hoc detection. While token-level watermarks are vulnerable to paraphrase attacks, SemStamp (Hou et al., 2023) applies watermark on the semantic representation of sentences and demonstrates promising robustness. SemStamp employs locality-sensitive hashing (LSH) to partition the semantic space with arbitrary hyperplanes, which results in a suboptimal tradeoff between robustness and speed. We propose k-SemStamp, a simple yet effective enhancement of SemStamp, utilizing k-means clustering as an alternative of LSH to partition the embedding space with awareness of inherent semantic structure. Experimental results indicate that k-SemStamp saliently improves its robustness and sampling efficiency while preserving the generation quality, advancing a more effective tool for machine-generated text detection.
Paper Structure (24 sections, 4 equations, 5 figures, 5 tables, 2 algorithms)

This paper contains 24 sections, 4 equations, 5 figures, 5 tables, 2 algorithms.

Figures (5)

  • Figure 1: Illustrations of the semantic space. Sentence embeddings with close meanings share similar colors. (Left) Random planes from LSH arbitrarily partition the semantic space and split similar sentences into different regions. (Right) Margin-based rejection in $k$-SemStamp. Sentence embeddings which fall into the gray-shaded areas of a valid region will be rejected.
  • Figure 2: An overview of the proposed $k$-SemStamp algorithm. $k$-means clustering partitions the semantic space into semantically similar regions. The sentence generation is accepted if the closest cluster of its sentence embedding corresponds to a "valid" region in the semantic space.
  • Figure 3: Generation Examples of $k$-SemStamp compared with SemStamp. Both generations are contextually sensible and coherent as compared to non-watermarked generations. Additional examples after paraphrase are presented in Figure \ref{['fig:textparaphraseexamples']} in the Appendix.
  • Figure 4: Detection results (AUC) under different generation lengths. $k$-SemStamp is more robust than SemStamp and KGW across length 100-400 tokens in most cases.
  • Figure 5: Examples of $k$-SemStamp after being paraphrased by Pegasus Paraphraser zhang2020pegasus. Green and plain sentences are detected, while red and underlined sentences are not. $k$-SemStamp generations are more robust to paraphrase, having a higher detection $z$-score than SemStamp.