Table of Contents
Fetching ...

Pseudorandom Error-Correcting Codes

Miranda Christ, Sam Gunn

TL;DR

This work introduces pseudorandom error-correcting codes (PRCs), a new cryptographic primitive that yields codewords indistinguishable from random to any efficient adversary unless a decoding key is known. By constructing PRCs from LDPC-like codes under standard assumptions such as LPN and planted-XOR (or subexponential-LPN in a weaker regime), the authors achieve robust, undetectable watermarks for language-model outputs and universal steganography that tolerate constant rates of substitutions and deletions. They develop zero-bit public-key PRCs with strong robustness to p-bounded channels and show how to boost to constant-rate, multi-bit PRCs, including deletion-channel variants via a majority-encoding trick. The practical upshot is a framework for quality-preserving watermarking and robust steganography that remains detectable (or publicly attributable) under a wide range of corrupted-channel conditions, enabling secure public attribution and stateless steganography. Overall, PRCs bridge cryptographic hardness assumptions with error-correcting structure to deliver cryptographically secure, robust signaling in AI-generated content with broad implications for watermarking and information hiding in real-world systems.

Abstract

We construct pseudorandom error-correcting codes (or simply pseudorandom codes), which are error-correcting codes with the property that any polynomial number of codewords are pseudorandom to any computationally-bounded adversary. Efficient decoding of corrupted codewords is possible with the help of a decoding key. We build pseudorandom codes that are robust to substitution and deletion errors, where pseudorandomness rests on standard cryptographic assumptions. Specifically, pseudorandomness is based on either $2^{O(\sqrt{n})}$-hardness of LPN, or polynomial hardness of LPN and the planted XOR problem at low density. As our primary application of pseudorandom codes, we present an undetectable watermarking scheme for outputs of language models that is robust to cropping and a constant rate of random substitutions and deletions. The watermark is undetectable in the sense that any number of samples of watermarked text are computationally indistinguishable from text output by the original model. This is the first undetectable watermarking scheme that can tolerate a constant rate of errors. Our second application is to steganography, where a secret message is hidden in innocent-looking content. We present a constant-rate stateless steganography scheme with robustness to a constant rate of substitutions. Ours is the first stateless steganography scheme with provable steganographic security and any robustness to errors.

Pseudorandom Error-Correcting Codes

TL;DR

This work introduces pseudorandom error-correcting codes (PRCs), a new cryptographic primitive that yields codewords indistinguishable from random to any efficient adversary unless a decoding key is known. By constructing PRCs from LDPC-like codes under standard assumptions such as LPN and planted-XOR (or subexponential-LPN in a weaker regime), the authors achieve robust, undetectable watermarks for language-model outputs and universal steganography that tolerate constant rates of substitutions and deletions. They develop zero-bit public-key PRCs with strong robustness to p-bounded channels and show how to boost to constant-rate, multi-bit PRCs, including deletion-channel variants via a majority-encoding trick. The practical upshot is a framework for quality-preserving watermarking and robust steganography that remains detectable (or publicly attributable) under a wide range of corrupted-channel conditions, enabling secure public attribution and stateless steganography. Overall, PRCs bridge cryptographic hardness assumptions with error-correcting structure to deliver cryptographically secure, robust signaling in AI-generated content with broad implications for watermarking and information hiding in real-world systems.

Abstract

We construct pseudorandom error-correcting codes (or simply pseudorandom codes), which are error-correcting codes with the property that any polynomial number of codewords are pseudorandom to any computationally-bounded adversary. Efficient decoding of corrupted codewords is possible with the help of a decoding key. We build pseudorandom codes that are robust to substitution and deletion errors, where pseudorandomness rests on standard cryptographic assumptions. Specifically, pseudorandomness is based on either -hardness of LPN, or polynomial hardness of LPN and the planted XOR problem at low density. As our primary application of pseudorandom codes, we present an undetectable watermarking scheme for outputs of language models that is robust to cropping and a constant rate of random substitutions and deletions. The watermark is undetectable in the sense that any number of samples of watermarked text are computationally indistinguishable from text output by the original model. This is the first undetectable watermarking scheme that can tolerate a constant rate of errors. Our second application is to steganography, where a secret message is hidden in innocent-looking content. We present a constant-rate stateless steganography scheme with robustness to a constant rate of substitutions. Ours is the first stateless steganography scheme with provable steganographic security and any robustness to errors.
Paper Structure (73 sections, 42 theorems, 108 equations, 6 figures, 2 tables, 8 algorithms)

This paper contains 73 sections, 42 theorems, 108 equations, 6 figures, 2 tables, 8 algorithms.

Key Result

Theorem 1

Let $p \in (0,1/2)$ be any constant. Under assumption:combined, there exists a zero-bit public-key PRC that is robust to every $p$-bounded channel.

Figures (6)

  • Figure 1: The signature forgery experiment $\mathsf{SigForge}_{\adv, \Pi}(\secpar)$
  • Figure 2: Objects from the proof of \ref{['lemma:deletion-channel']}: $y \gets \mathsf{MajEnc}_m(x)$, $z \gets \text{BSC}_q \circ \text{BDC}_p(y)$, the function $f$, and the partitions $\mathop{\mathrm{\mathcal{R}}}\nolimits, \mathop{\mathrm{\mathcal{S}}}\nolimits$ for $n = 3, m = 5$. In this illustration $x = (1, 1, 0)$ and $\mathsf{MajDec}_3(z) = (0, 1, 0)$. There are deletions at locations $D = \{3, 11\}$ (indicated by solid red), and there are errors from $\text{BSC}_q$ at locations 2 and 4 (indicated by hatched red). The arrows indicate the mapping $f$, i.e. an arrow points from an index in $z$ to the index in $y$ from which it originated.
  • Figure 3: Watermark setup procedure $\mathsf{Setup}(\secparam)$
  • Figure 4: Publicly attributable watermark setup procedure $\mathsf{Setup}(\secparam)$
  • Figure 5: The attribution forgery experiment $\mathsf{AttrForge}_{\adv, \mathop{\mathrm{\mathcal{W}}}\nolimits_{\sf att}}(\secpar)$
  • ...and 1 more figures

Theorems & Definitions (108)

  • Definition : \ref{['def:skPRC', 'def:pkPRC']}
  • Theorem : \ref{['theorem:ldpc-prc-lpn', 'theorem:ldpc-prc-xor']}
  • Theorem : \ref{['theorem:constant-rate-prcs']}
  • Corollary
  • Theorem : \ref{['theorem:deletion-code']}
  • Corollary
  • Theorem : \ref{['theorem:robust-water']}
  • Theorem : \ref{['theorem:deletion-robust-water']}
  • Theorem : \ref{['theorem:watermark-att']}
  • Theorem : \ref{['theorem:language-model-stego']}
  • ...and 98 more