Table of Contents
Fetching ...

FreqMark: Frequency-Based Watermark for Sentence-Level Detection of LLM-Generated Text

Zhenyu Xu, Kun Zhang, Victor S. Sheng

TL;DR

The proposed FreqMark is a novel watermarking technique that embeds detectable frequency-based watermarks in LLM-generated text during the token sampling process, creating a watermark that can be detected with Short-Time Fourier Transform (STFT) analysis.

Abstract

The increasing use of Large Language Models (LLMs) for generating highly coherent and contextually relevant text introduces new risks, including misuse for unethical purposes such as disinformation or academic dishonesty. To address these challenges, we propose FreqMark, a novel watermarking technique that embeds detectable frequency-based watermarks in LLM-generated text during the token sampling process. The method leverages periodic signals to guide token selection, creating a watermark that can be detected with Short-Time Fourier Transform (STFT) analysis. This approach enables accurate identification of LLM-generated content, even in mixed-text scenarios with both human-authored and LLM-generated segments. Our experiments demonstrate the robustness and precision of FreqMark, showing strong detection capabilities against various attack scenarios such as paraphrasing and token substitution. Results show that FreqMark achieves an AUC improvement of up to 0.98, significantly outperforming existing detection methods.

FreqMark: Frequency-Based Watermark for Sentence-Level Detection of LLM-Generated Text

TL;DR

The proposed FreqMark is a novel watermarking technique that embeds detectable frequency-based watermarks in LLM-generated text during the token sampling process, creating a watermark that can be detected with Short-Time Fourier Transform (STFT) analysis.

Abstract

The increasing use of Large Language Models (LLMs) for generating highly coherent and contextually relevant text introduces new risks, including misuse for unethical purposes such as disinformation or academic dishonesty. To address these challenges, we propose FreqMark, a novel watermarking technique that embeds detectable frequency-based watermarks in LLM-generated text during the token sampling process. The method leverages periodic signals to guide token selection, creating a watermark that can be detected with Short-Time Fourier Transform (STFT) analysis. This approach enables accurate identification of LLM-generated content, even in mixed-text scenarios with both human-authored and LLM-generated segments. Our experiments demonstrate the robustness and precision of FreqMark, showing strong detection capabilities against various attack scenarios such as paraphrasing and token substitution. Results show that FreqMark achieves an AUC improvement of up to 0.98, significantly outperforming existing detection methods.

Paper Structure

This paper contains 24 sections, 3 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Example of Watermark Detection for Concatenated Text. The slanting typeface text represent human-authored prompt, while the regular typeface parts signify text generated by gpt-3.5-turbo-instruct that embeds watermarks. The yellow highlighted segments indicate portions of the text where the watermark was detected and correctly identified as LLM-generated. Conversely, the green segments represent errors where the text was mistakenly flagged. GPTZero gptzero detection completely fails in this sentence, which was highly confident to determine text is entirely human.
  • Figure 2: STFT Analysis of Concatenated LLM-generated and Human-authored Text. The bright yellow regions indicate significant frequency components around 0.1 Hz, corresponding to the periodic signal used for watermark embedding. These peaks highlight LLM-generated content, distinguishing it from human-written segments.
  • Figure 3: FreqMark watermark robustness under various attack scenarios.