Table of Contents
Fetching ...

Dynamically Allocated Interval-Based Generative Linguistic Steganography with Roulette Wheel

Yihao Wang, Ruiqi Song, Lingxiao Li, Ru Zhang, Jianyi Liu

TL;DR

This paper tackles the problem of covert linguistic communication by addressing the uniform coding bias in prior steganography schemes. It introduces DAIRstega, a dynamic roulette wheel based embedding scheme that allocates token intervals proportional to their conditional probabilities, enabling higher quality stegos and stronger anti-steganalysis. The method supports prompt based instructions and uses a CP driven embedding function with tunable parameters alpha and beta to modulate payload and diversity, including a simple expression for interval allocation $n_j = floor((p'_j / sum p')^\beta \times (2^\alpha - 1))$. Empirical results on MindSpore with open source LLMs show improvements across perceptual, statistical, and semantic concealment, as well as anti-steganalysis, and demonstrate the capability to generate longer stegos and function as a secure watermarking approach. The work offers a practical, CP aware framework with potential applications in watermarking and covert communications, and outlines directions for future research on long stegos and steganalysis advancements.

Abstract

Existing linguistic steganography schemes often overlook the conditional probability (CP) of tokens in the candidate pool, allocating the one coding to all tokens, which results in identical selection likelihoods. This approach leads to the selection of low-CP tokens, degrading the quality of stegos and making them more detectable. This paper proposes a scheme based on the interval allocated, called DAIRstega. DAIRstega first uses a portion of the read secret to build the roulette area. Then, this scheme uses the idea of the roulette wheel and takes the CPs of tokens as the main basis for allocating the roulette area (i.e., the interval length). Thus, tokens with larger CPs are allocated more area. The secret will have an increased likelihood of selecting a token with a higher CP. During allocation, we designed some allocation functions and three constraints to optimize the process. Additionally, DAIRstega supports prompt-based controllable generation of stegos. Rich experiments show that the proposed embedding way and DAIRstega perform better than the existing ways and baselines, which shows strong perceptual, statistical, and semantic concealment, as well as anti-steganalysis ability. It can also generate high-quality longer stegos, addressing the deficiencies in this task. DAIRstega is confirmed to have potential as a secure watermarking, offering insights for its development.

Dynamically Allocated Interval-Based Generative Linguistic Steganography with Roulette Wheel

TL;DR

This paper tackles the problem of covert linguistic communication by addressing the uniform coding bias in prior steganography schemes. It introduces DAIRstega, a dynamic roulette wheel based embedding scheme that allocates token intervals proportional to their conditional probabilities, enabling higher quality stegos and stronger anti-steganalysis. The method supports prompt based instructions and uses a CP driven embedding function with tunable parameters alpha and beta to modulate payload and diversity, including a simple expression for interval allocation . Empirical results on MindSpore with open source LLMs show improvements across perceptual, statistical, and semantic concealment, as well as anti-steganalysis, and demonstrate the capability to generate longer stegos and function as a secure watermarking approach. The work offers a practical, CP aware framework with potential applications in watermarking and covert communications, and outlines directions for future research on long stegos and steganalysis advancements.

Abstract

Existing linguistic steganography schemes often overlook the conditional probability (CP) of tokens in the candidate pool, allocating the one coding to all tokens, which results in identical selection likelihoods. This approach leads to the selection of low-CP tokens, degrading the quality of stegos and making them more detectable. This paper proposes a scheme based on the interval allocated, called DAIRstega. DAIRstega first uses a portion of the read secret to build the roulette area. Then, this scheme uses the idea of the roulette wheel and takes the CPs of tokens as the main basis for allocating the roulette area (i.e., the interval length). Thus, tokens with larger CPs are allocated more area. The secret will have an increased likelihood of selecting a token with a higher CP. During allocation, we designed some allocation functions and three constraints to optimize the process. Additionally, DAIRstega supports prompt-based controllable generation of stegos. Rich experiments show that the proposed embedding way and DAIRstega perform better than the existing ways and baselines, which shows strong perceptual, statistical, and semantic concealment, as well as anti-steganalysis ability. It can also generate high-quality longer stegos, addressing the deficiencies in this task. DAIRstega is confirmed to have potential as a secure watermarking, offering insights for its development.
Paper Structure (34 sections, 10 equations, 4 figures, 15 tables, 1 algorithm)

This paper contains 34 sections, 10 equations, 4 figures, 15 tables, 1 algorithm.

Figures (4)

  • Figure 1: Examples of existing embedding ways and the proposed embedding way. After selecting $n$ tokens with larger CPs to build a candidate pool, the existing way assigns a coding to each token in the candidate pool. Since the secret binary can be arbitrary, to embed the secret, the existing embedding way selects all tokens with the same likelihood of $1/n$. The proposed way will build a roulette area and allocate it according to the CP value to ensure that the larger the CP of the token, the larger the allocation area. Therefore, the proposed way has a higher likelihood of selecting a better token. In the end, the quality of the token determined by the proposed way is generally better than that of the existing ways.
  • Figure 2: The result of large language model (LLM) + embedding ways. "$\uparrow$" and "$\downarrow$" represent the higher / lower the value, the better the result. The metrics are found in "Section \ref{['sec31']}". The values are seen in Table \ref{['embedding']}.
  • Figure 3: Example of texts being transmitted and detected. Responses that match the discourse are difficult to perceive, thus reducing the risk of steganalysis, such as Alice 1.
  • Figure 4: The DAIRstega's framework. The "stego generation process" consists of two parts: the "CP (conditional probability) generation" and "Embedding" modules. Different from existing works, DAIRstega uses the idea of the non-uniform roulette wheel to dynamically allocate different roulette areas (numbers of codings) to tokens in the candidate pool. In this way, the secret is more likely to fall on the tokens with larger CPs, ensuring the quality of stegos. In addition, DAIRstega can not only receive secret information (in the form of bitstreams), but also input the instruction, and finally generate stego that conforms to the instruction.