Dynamically Allocated Interval-Based Generative Linguistic Steganography with Roulette Wheel
Yihao Wang, Ruiqi Song, Lingxiao Li, Ru Zhang, Jianyi Liu
TL;DR
This paper tackles the problem of covert linguistic communication by addressing the uniform coding bias in prior steganography schemes. It introduces DAIRstega, a dynamic roulette wheel based embedding scheme that allocates token intervals proportional to their conditional probabilities, enabling higher quality stegos and stronger anti-steganalysis. The method supports prompt based instructions and uses a CP driven embedding function with tunable parameters alpha and beta to modulate payload and diversity, including a simple expression for interval allocation $n_j = floor((p'_j / sum p')^\beta \times (2^\alpha - 1))$. Empirical results on MindSpore with open source LLMs show improvements across perceptual, statistical, and semantic concealment, as well as anti-steganalysis, and demonstrate the capability to generate longer stegos and function as a secure watermarking approach. The work offers a practical, CP aware framework with potential applications in watermarking and covert communications, and outlines directions for future research on long stegos and steganalysis advancements.
Abstract
Existing linguistic steganography schemes often overlook the conditional probability (CP) of tokens in the candidate pool, allocating the one coding to all tokens, which results in identical selection likelihoods. This approach leads to the selection of low-CP tokens, degrading the quality of stegos and making them more detectable. This paper proposes a scheme based on the interval allocated, called DAIRstega. DAIRstega first uses a portion of the read secret to build the roulette area. Then, this scheme uses the idea of the roulette wheel and takes the CPs of tokens as the main basis for allocating the roulette area (i.e., the interval length). Thus, tokens with larger CPs are allocated more area. The secret will have an increased likelihood of selecting a token with a higher CP. During allocation, we designed some allocation functions and three constraints to optimize the process. Additionally, DAIRstega supports prompt-based controllable generation of stegos. Rich experiments show that the proposed embedding way and DAIRstega perform better than the existing ways and baselines, which shows strong perceptual, statistical, and semantic concealment, as well as anti-steganalysis ability. It can also generate high-quality longer stegos, addressing the deficiencies in this task. DAIRstega is confirmed to have potential as a secure watermarking, offering insights for its development.
