Table of Contents
Fetching ...

A Watermark for Order-Agnostic Language Models

Ruibo Chen, Yihan Wu, Yanshuo Chen, Chenxi Liu, Junfeng Guo, Heng Huang

TL;DR

This work develops a Markov-chain-based watermark generator that produces watermark key sequences with high-frequency key patterns and proposes a statistical pattern-based detection algorithm that recovers the key sequence during detection and conducts statistical tests based on the count of high-frequency patterns.

Abstract

Statistical watermarking techniques are well-established for sequentially decoded language models (LMs). However, these techniques cannot be directly applied to order-agnostic LMs, as the tokens in order-agnostic LMs are not generated sequentially. In this work, we introduce Pattern-mark, a pattern-based watermarking framework specifically designed for order-agnostic LMs. We develop a Markov-chain-based watermark generator that produces watermark key sequences with high-frequency key patterns. Correspondingly, we propose a statistical pattern-based detection algorithm that recovers the key sequence during detection and conducts statistical tests based on the count of high-frequency patterns. Our extensive evaluations on order-agnostic LMs, such as ProteinMPNN and CMLM, demonstrate Pattern-mark's enhanced detection efficiency, generation quality, and robustness, positioning it as a superior watermarking technique for order-agnostic LMs.

A Watermark for Order-Agnostic Language Models

TL;DR

This work develops a Markov-chain-based watermark generator that produces watermark key sequences with high-frequency key patterns and proposes a statistical pattern-based detection algorithm that recovers the key sequence during detection and conducts statistical tests based on the count of high-frequency patterns.

Abstract

Statistical watermarking techniques are well-established for sequentially decoded language models (LMs). However, these techniques cannot be directly applied to order-agnostic LMs, as the tokens in order-agnostic LMs are not generated sequentially. In this work, we introduce Pattern-mark, a pattern-based watermarking framework specifically designed for order-agnostic LMs. We develop a Markov-chain-based watermark generator that produces watermark key sequences with high-frequency key patterns. Correspondingly, we propose a statistical pattern-based detection algorithm that recovers the key sequence during detection and conducts statistical tests based on the count of high-frequency patterns. Our extensive evaluations on order-agnostic LMs, such as ProteinMPNN and CMLM, demonstrate Pattern-mark's enhanced detection efficiency, generation quality, and robustness, positioning it as a superior watermarking technique for order-agnostic LMs.

Paper Structure

This paper contains 18 sections, 1 equation, 6 figures, 7 tables, 4 algorithms.

Figures (6)

  • Figure 1: Sequentially decoding vs. Order-agnostic decoding. Sequentially decoded text, follows a fixed, left-to-right construction while order-agnostic decoding generates text by filling in words without adherence to traditional reading order. Most of current watermarking methods typically rely on previously generated context (n-gram), which is not consistently available in order-agnostic LMs.
  • Figure 2: Illustration of pattern-mark. The watermark generation process begins with a Markov chain-based key generator that produces a key sequence. This sequence is then used to modify token probabilities during language model sampling. In the watermark detection phase, the key sequence is recovered from the generated content, and the false positive rate is calculated by counting the occurrences of specific patterns within the key sequence.
  • Figure 3: Comparison of the trade-off between TPR and generalization quality on protein generation and machine translation tasks. Left. TPR@FPR=0.1% vs. pLDDT on protein generation task. Right. TPR@FPR=0.1% vs. BLEU on machine translation task.
  • Figure 4: Evaluation of the effect of pattern length $m$ on the detection efficiency of pattern-mark. Left. Protein generation task. Right. Machine translation task.
  • Figure 5: Evaluation of the effect of transition matrix probability on the generation quality of pattern-mark. Left. Protein generation task. Right. Machine translation task.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Definition 3.1: Markov-chain based key sequence generation