Shifting-Merging: Secure, High-Capacity and Efficient Steganography via Large Language Models
Minhao Bai, Jinshuai Yang, Kaiyi Pang, Yongfeng Huang, Yue Gao
TL;DR
ShiMer addresses the problem of secure, high-capacity text steganography by leveraging explicit next-token distributions from large language models. It encodes secret bits by pseudorandomly shifting and merging probability intervals, with decoding mirroring the process; a reordering step further reduces interval-splitting errors. The method achieves provable security, high embedding/utilization, and favorable channel capacity across multiple models, while maintaining text quality close to random sampling. This approach offers practical privacy protection in censorship-prone environments and can extend to other autoregressive domains, though it requires pre-shared keys or PRGs for operation and does not alter the model’s entropy. The work demonstrates that interval-shifting encoding can outperform prior secure steganography techniques in both capacity and efficiency, validated through comprehensive experiments and analyses, including a formal security justification via $D_{KL}(P_S||P_C) = 0$.
Abstract
In the face of escalating surveillance and censorship within the cyberspace, the sanctity of personal privacy has come under siege, necessitating the development of steganography, which offers a way to securely hide messages within innocent-looking texts. Previous methods alternate the texts to hide private massages, which is not secure. Large Language Models (LLMs) provide high-quality and explicit distribution, which is an available mathematical tool for secure steganography methods. However, existing attempts fail to achieve high capacity, time efficiency and correctness simultaneously, and their strongly coupling designs leave little room for refining them to achieve better performance. To provide a secure, high-capacity and efficient steganography method, we introduce ShiMer. Specifically, ShiMer pseudorandomly shifts the probability interval of the LLM's distribution to obtain a private distribution, and samples a token according to the private bits. ShiMer produced steganographic texts are indistinguishable in quality from the normal texts directly generated by the language model. To further enhance the capacity of ShiMer, we design a reordering algorithm to minimize the occurrence of interval splitting during decoding phase. Experimental results indicate that our method achieves the highest capacity and efficiency among existing secure steganography techniques.
