Table of Contents
Fetching ...

Kolmogorov Complexity Bounds for LLM Steganography and a Perplexity-Based Detection Proxy

Andrii Shportko

Abstract

Large language models can rewrite text to embed hidden payloads while preserving surface-level meaning, a capability that opens covert channels between cooperating AI systems and poses challenges for alignment monitoring. We study the information-theoretic cost of such embedding. Our main result is that any steganographic scheme that preserves the semantic load of a covertext~$M_1$ while encoding a payload~$P$ into a stegotext~$M_2$ must satisfy $K(M_2) \geq K(M_1) + K(P) - O(\log n)$, where $K$ denotes Kolmogorov complexity and $n$ is the combined message length. A corollary is that any non-trivial payload forces a strict complexity increase in the stegotext, regardless of how cleverly the encoder distributes the signal. Because Kolmogorov complexity is uncomputable, we ask whether practical proxies can detect this predicted increase. Drawing on the classical correspondence between lossless compression and Kolmogorov complexity, we argue that language-model perplexity occupies an analogous role in the probabilistic regime and propose the Binoculars perplexity-ratio score as one such proxy. Preliminary experiments with a color-based LLM steganographic scheme support the theoretical prediction: a paired $t$-test over 300 samples yields $t = 5.11$, $p < 10^{-6}$.

Kolmogorov Complexity Bounds for LLM Steganography and a Perplexity-Based Detection Proxy

Abstract

Large language models can rewrite text to embed hidden payloads while preserving surface-level meaning, a capability that opens covert channels between cooperating AI systems and poses challenges for alignment monitoring. We study the information-theoretic cost of such embedding. Our main result is that any steganographic scheme that preserves the semantic load of a covertext~ while encoding a payload~ into a stegotext~ must satisfy , where denotes Kolmogorov complexity and is the combined message length. A corollary is that any non-trivial payload forces a strict complexity increase in the stegotext, regardless of how cleverly the encoder distributes the signal. Because Kolmogorov complexity is uncomputable, we ask whether practical proxies can detect this predicted increase. Drawing on the classical correspondence between lossless compression and Kolmogorov complexity, we argue that language-model perplexity occupies an analogous role in the probabilistic regime and propose the Binoculars perplexity-ratio score as one such proxy. Preliminary experiments with a color-based LLM steganographic scheme support the theoretical prediction: a paired -test over 300 samples yields , .
Paper Structure (19 sections, 3 theorems, 30 equations, 4 figures, 1 table)

This paper contains 19 sections, 3 theorems, 30 equations, 4 figures, 1 table.

Key Result

Theorem 3

Let $M_2 = \mathrm{Embed}(M_1, P)$ for a lossless semantic-preserving embedding. Let $n = |M_1| + |M_2|$. Then

Figures (4)

  • Figure 1: Per-color decoding accuracy for encoded (solid) vs. paraphrased (hatched) text. Dashed line: chance level ($12.5\%$).
  • Figure 2: Confusion matrices. Left: direct decoding ($95.7\%$). Right: after adversarial paraphrasing ($54.8\%$). Paraphrasing introduces a strong bias toward black.
  • Figure 3: Binoculars score distributions for encoded vs. paraphrased text. Encoding significantly increases scores, consistent with Theorem \ref{['thm:main']}.
  • Figure 4: Paired Binoculars score differences (paraphrased $-$ encoded). The negative mean ($-0.310$) confirms that encoding increases complexity.

Theorems & Definitions (8)

  • Definition 1: Identical Semantic Load
  • Definition 2: Lossless Semantic-Preserving Embedding
  • Theorem 3
  • proof
  • Corollary 4: Strict Complexity Increase
  • Definition 5: Quadroculars Score
  • proposition 1
  • proof