Table of Contents
Fetching ...

Provable Secure Steganography Based on Adaptive Dynamic Sampling

Kaiyi Pang, Minhao Bai

TL;DR

This work tackles the limitation of Provably Secure Steganography (PSS) methods that require explicit model distributions by introducing Adaptive Dynamic Sampling (ADS), a seed-based sampling scheme that preserves the model's distribution without direct access. ADS encodes secret bits via a dynamic collision-set mechanism that maps N-bit prefixes to sampled tokens, expanding the set as needed to maintain capacity while ensuring correct decoding. The authors provide a formal security proof under standard PRG-based indistinguishability and demonstrate through extensive experiments on three LLMs and three datasets that ADS achieves high embedding capacity and entropy utilization with competitive efficiency and deterministic decoding. The approach significantly broadens the practicality of PSS by enabling secure embedding through API-like access, with strong theoretical guarantees and real-world performance comparable to distribution-dependent methods. Overall, ADS offers a scalable, provably secure steganographic framework for covert communication over modern language-model APIs.

Abstract

The security of private communication is increasingly at risk due to widespread surveillance. Steganography, a technique for embedding secret messages within innocuous carriers, enables covert communication over monitored channels. Provably Secure Steganography (PSS), which ensures computational indistinguishability between the normal model output and steganography output, is the state-of-the-art in this field. However, current PSS methods often require obtaining the explicit distributions of the model. In this paper, we propose a provably secure steganography scheme that only requires a model API that accepts a seed as input. Our core mechanism involves sampling a candidate set of tokens and constructing a map from possible message bit strings to these tokens. The output token is selected by applying this mapping to the real secret message, which provably preserves the original model's distribution. To ensure correct decoding, we address collision cases, where multiple candidate messages map to the same token, by maintaining and strategically expanding a dynamic collision set within a bounded size range. Extensive evaluations of three real-world datasets and three large language models demonstrate that our sampling-based method is comparable with existing PSS methods in efficiency and capacity.

Provable Secure Steganography Based on Adaptive Dynamic Sampling

TL;DR

This work tackles the limitation of Provably Secure Steganography (PSS) methods that require explicit model distributions by introducing Adaptive Dynamic Sampling (ADS), a seed-based sampling scheme that preserves the model's distribution without direct access. ADS encodes secret bits via a dynamic collision-set mechanism that maps N-bit prefixes to sampled tokens, expanding the set as needed to maintain capacity while ensuring correct decoding. The authors provide a formal security proof under standard PRG-based indistinguishability and demonstrate through extensive experiments on three LLMs and three datasets that ADS achieves high embedding capacity and entropy utilization with competitive efficiency and deterministic decoding. The approach significantly broadens the practicality of PSS by enabling secure embedding through API-like access, with strong theoretical guarantees and real-world performance comparable to distribution-dependent methods. Overall, ADS offers a scalable, provably secure steganographic framework for covert communication over modern language-model APIs.

Abstract

The security of private communication is increasingly at risk due to widespread surveillance. Steganography, a technique for embedding secret messages within innocuous carriers, enables covert communication over monitored channels. Provably Secure Steganography (PSS), which ensures computational indistinguishability between the normal model output and steganography output, is the state-of-the-art in this field. However, current PSS methods often require obtaining the explicit distributions of the model. In this paper, we propose a provably secure steganography scheme that only requires a model API that accepts a seed as input. Our core mechanism involves sampling a candidate set of tokens and constructing a map from possible message bit strings to these tokens. The output token is selected by applying this mapping to the real secret message, which provably preserves the original model's distribution. To ensure correct decoding, we address collision cases, where multiple candidate messages map to the same token, by maintaining and strategically expanding a dynamic collision set within a bounded size range. Extensive evaluations of three real-world datasets and three large language models demonstrate that our sampling-based method is comparable with existing PSS methods in efficiency and capacity.

Paper Structure

This paper contains 36 sections, 11 equations, 5 figures, 5 tables, 2 algorithms.

Figures (5)

  • Figure 1: Overall framework of covert communication.
  • Figure 2: An example of encoding algorithm \ref{['encoding_algo']} when the maximum size of the collision set is set to 4 at each time step ($N$=2). Here, $m$ represents the secret message, $s_t$ is the stego token output at time $t$, and $seed_{10}$ represents the seed for message $10$. The arrows ($\downarrow$) indicate the sampling outcomes (balls) based on the $seed$ determined by the candidate messages. Red secret message in $\textbf{m}$ denotes the already embedded secret bits, and underlined text in $\textbf{m}$ represents the secret message that can be immediately extracted.
  • Figure 3: Illustration of dynamically determining the number of samples and expanding the secret. (a) Non-expansion case $L=0$: at step $t+1$, the number of samples $|CS|$ is the size of the conflict set at step $t$. (b) General expansion case $0<L\le N$: at step $t+1$, the number of samples is $2^{L}\!\cdot\!|CS|$.
  • Figure 4: Games used in the proof of steganography security.
  • Figure 5: The relationship between the number of uniform distribution samples and embedding capacity. $N$ represents the maximum number of samples as $2^{N}$, the red dashed line indicates entropy, and the green line represents the generation time per token.

Theorems & Definitions (4)

  • Definition 1: Sampleable channel
  • Definition 2: Steganography scheme
  • Definition 3: Correctness
  • Definition 4: Security