Table of Contents
Fetching ...

Topic-Based Watermarks for Large Language Models

Alexander Nemecek, Yuzhou Jiang, Erman Ayday

TL;DR

The paper tackles the problem of reliably attributing AI-generated text by introducing Topic-Based Watermarks (TBW), a lightweight, topic-guided watermarking scheme for LLMs. TBW constructs topic-aligned green lists via token-to-topic mappings and embeds watermarks through a single-pass logit bias guided by prompt-derived topics, avoiding additional decoding steps. Empirical results show TBW achieves perplexity comparable to production watermarking like SynthID-Text while delivering enhanced robustness against paraphrasing and lexical perturbations and maintaining high efficiency. The approach offers a practical path toward globally deployable watermarking with resilient detection and minimal impact on text quality.

Abstract

The indistinguishability of Large Language Model (LLM) output from human-authored content poses significant challenges, raising concerns about potential misuse of AI-generated text and its influence on future AI model training. Watermarking algorithms offer a viable solution by embedding detectable signatures into generated text. However, existing watermarking methods often entail trade-offs among attack robustness, generation quality, and additional overhead such as specialized frameworks or complex integrations. We propose a lightweight, topic-guided watermarking scheme for LLMs that partitions the vocabulary into topic-aligned token subsets. Given an input prompt, the scheme selects a relevant topic-specific token list, effectively "green-listing" semantically aligned tokens to embed robust marks while preserving the text's fluency and coherence. Experimental results across multiple LLMs and state-of-the-art benchmarks demonstrate that our method achieves comparable perplexity to industry-leading systems, including Google's SynthID-Text, yet enhances watermark robustness against paraphrasing and lexical perturbation attacks while introducing minimal performance overhead. Our approach avoids reliance on additional mechanisms beyond standard text generation pipelines, facilitating straightforward adoption, suggesting a practical path toward globally consistent watermarking of AI-generated content.

Topic-Based Watermarks for Large Language Models

TL;DR

The paper tackles the problem of reliably attributing AI-generated text by introducing Topic-Based Watermarks (TBW), a lightweight, topic-guided watermarking scheme for LLMs. TBW constructs topic-aligned green lists via token-to-topic mappings and embeds watermarks through a single-pass logit bias guided by prompt-derived topics, avoiding additional decoding steps. Empirical results show TBW achieves perplexity comparable to production watermarking like SynthID-Text while delivering enhanced robustness against paraphrasing and lexical perturbations and maintaining high efficiency. The approach offers a practical path toward globally deployable watermarking with resilient detection and minimal impact on text quality.

Abstract

The indistinguishability of Large Language Model (LLM) output from human-authored content poses significant challenges, raising concerns about potential misuse of AI-generated text and its influence on future AI model training. Watermarking algorithms offer a viable solution by embedding detectable signatures into generated text. However, existing watermarking methods often entail trade-offs among attack robustness, generation quality, and additional overhead such as specialized frameworks or complex integrations. We propose a lightweight, topic-guided watermarking scheme for LLMs that partitions the vocabulary into topic-aligned token subsets. Given an input prompt, the scheme selects a relevant topic-specific token list, effectively "green-listing" semantically aligned tokens to embed robust marks while preserving the text's fluency and coherence. Experimental results across multiple LLMs and state-of-the-art benchmarks demonstrate that our method achieves comparable perplexity to industry-leading systems, including Google's SynthID-Text, yet enhances watermark robustness against paraphrasing and lexical perturbation attacks while introducing minimal performance overhead. Our approach avoids reliance on additional mechanisms beyond standard text generation pipelines, facilitating straightforward adoption, suggesting a practical path toward globally consistent watermarking of AI-generated content.
Paper Structure (22 sections, 2 equations, 6 figures, 4 tables, 3 algorithms)

This paper contains 22 sections, 2 equations, 6 figures, 4 tables, 3 algorithms.

Figures (6)

  • Figure 1: Text perplexity comparison of different LLMs: (Top) OPT-6.7B, (Bottom) Gemma-7B using various watermarking schemes. Lower text perplexity indicates a higher generated text quality.
  • Figure 2: Comparison of average generation time (seconds) over various output token lengths from multiple watermarking schemes on OPT-6.7B.
  • Figure 3: Detection scores for different watermarking schemes under combination attacks: random word perturbations (left) and targeted word perturbations (right) affecting nouns, verbs, etc. Solid ticks indicate scores above the threshold, while white ticks represent scores below the threshold. Higher scores indicate higher robustness to perturbation attacks.
  • Figure 4: Comparison of average generation time (seconds) over various output token lengths from multiple watermarking schemes on OPT-2.7B.
  • Figure 5: Comparisons of ROC curves of different watermark methods applied to OPT-6.7BGemma-7B and against PEGASUS paraphrasing attacks.
  • ...and 1 more figures