Provably Secure Robust Image Steganography via Cross-Modal Error Correction
Yuang Qi, Kejiang Chen, Na Zhao, Zijin Yang, Weiming Zhang
TL;DR
CMSteg addresses the challenge of provably secure, robust image steganography in the era of high-quality AR image generation. It introduces a three-module pipeline—secure embedding (M1), discrete token optimization (M2), and cross-modal error correction (M3)—to achieve high-quality stego images while preserving provable security under standard PSS definitions. The cross-modal framework carries error-correction information via semantically aligned stego text produced by a vision-language model, improving robustness against lossy channels such as JPEG, with formal security guarantees and empirical validation showing high image quality, strong robustness, and indistinguishability from covers ($\text{Pr}_A[\text{distinguish}] \approx 0.5$). Practically, CMSteg enables high-resolution, semantically controlled, secure steganography suitable for OSN dissemination, coupling image generation with text-based error correction to reliably recover hidden messages.
Abstract
The rapid development of image generation models has facilitated the widespread dissemination of generated images on social networks, creating favorable conditions for provably secure image steganography. However, existing methods face issues such as low quality of generated images and lack of semantic control in the generation process. To leverage provably secure steganography with more effective and high-performance image generation models, and to ensure that stego images can accurately extract secret messages even after being uploaded to social networks and subjected to lossy processing such as JPEG compression, we propose a high-quality, provably secure, and robust image steganography method based on state-of-the-art autoregressive (AR) image generation models using Vector-Quantized (VQ) tokenizers. Additionally, we employ a cross-modal error-correction framework that generates stego text from stego images to aid in restoring lossy images, ultimately enabling the extraction of secret messages embedded within the images. Extensive experiments have demonstrated that the proposed method provides advantages in stego quality, embedding capacity, and robustness, while ensuring provable undetectability.
