Table of Contents
Fetching ...

Provably Secure Robust Image Steganography via Cross-Modal Error Correction

Yuang Qi, Kejiang Chen, Na Zhao, Zijin Yang, Weiming Zhang

TL;DR

CMSteg addresses the challenge of provably secure, robust image steganography in the era of high-quality AR image generation. It introduces a three-module pipeline—secure embedding (M1), discrete token optimization (M2), and cross-modal error correction (M3)—to achieve high-quality stego images while preserving provable security under standard PSS definitions. The cross-modal framework carries error-correction information via semantically aligned stego text produced by a vision-language model, improving robustness against lossy channels such as JPEG, with formal security guarantees and empirical validation showing high image quality, strong robustness, and indistinguishability from covers ($\text{Pr}_A[\text{distinguish}] \approx 0.5$). Practically, CMSteg enables high-resolution, semantically controlled, secure steganography suitable for OSN dissemination, coupling image generation with text-based error correction to reliably recover hidden messages.

Abstract

The rapid development of image generation models has facilitated the widespread dissemination of generated images on social networks, creating favorable conditions for provably secure image steganography. However, existing methods face issues such as low quality of generated images and lack of semantic control in the generation process. To leverage provably secure steganography with more effective and high-performance image generation models, and to ensure that stego images can accurately extract secret messages even after being uploaded to social networks and subjected to lossy processing such as JPEG compression, we propose a high-quality, provably secure, and robust image steganography method based on state-of-the-art autoregressive (AR) image generation models using Vector-Quantized (VQ) tokenizers. Additionally, we employ a cross-modal error-correction framework that generates stego text from stego images to aid in restoring lossy images, ultimately enabling the extraction of secret messages embedded within the images. Extensive experiments have demonstrated that the proposed method provides advantages in stego quality, embedding capacity, and robustness, while ensuring provable undetectability.

Provably Secure Robust Image Steganography via Cross-Modal Error Correction

TL;DR

CMSteg addresses the challenge of provably secure, robust image steganography in the era of high-quality AR image generation. It introduces a three-module pipeline—secure embedding (M1), discrete token optimization (M2), and cross-modal error correction (M3)—to achieve high-quality stego images while preserving provable security under standard PSS definitions. The cross-modal framework carries error-correction information via semantically aligned stego text produced by a vision-language model, improving robustness against lossy channels such as JPEG, with formal security guarantees and empirical validation showing high image quality, strong robustness, and indistinguishability from covers (). Practically, CMSteg enables high-resolution, semantically controlled, secure steganography suitable for OSN dissemination, coupling image generation with text-based error correction to reliably recover hidden messages.

Abstract

The rapid development of image generation models has facilitated the widespread dissemination of generated images on social networks, creating favorable conditions for provably secure image steganography. However, existing methods face issues such as low quality of generated images and lack of semantic control in the generation process. To leverage provably secure steganography with more effective and high-performance image generation models, and to ensure that stego images can accurately extract secret messages even after being uploaded to social networks and subjected to lossy processing such as JPEG compression, we propose a high-quality, provably secure, and robust image steganography method based on state-of-the-art autoregressive (AR) image generation models using Vector-Quantized (VQ) tokenizers. Additionally, we employ a cross-modal error-correction framework that generates stego text from stego images to aid in restoring lossy images, ultimately enabling the extraction of secret messages embedded within the images. Extensive experiments have demonstrated that the proposed method provides advantages in stego quality, embedding capacity, and robustness, while ensuring provable undetectability.

Paper Structure

This paper contains 23 sections, 18 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Provably Secure Image Steganography (PSIS) faces challenges in actual transmission within online social networks (OSNs).
  • Figure 2: Overview of the proposed provably secure robust image staganography for high-quality images. Three modules are comprised: the secure message embedding module, the discrete token optimization module, and the cross-modal error-correction module. The stego images and the stego-text are collectively transmitted to social networks to perform provably secure robust image steganography via cross-modal error-correction.
  • Figure 3: An example of Discop’s embedding algorithm given a distribution {'a':[0,0.4),'b':[0.4,1.0)}. A copy of the distribution that has been shifted by $0.5$ is {'a':[0.5,0.9),'b':[0,0.5)$\cup$[0.9,1.0)}. A random number controlled by $K$ falls into a token interval, while the number will fall into another interval after it is offset by 0.5. Depending on the message bit, the token into whose interval the random number falls can be selected, which is equivalent to using a copy of the distribution to represent different message bits.
  • Figure 4: Flowchart of the discrete token optimization module used in the proposed provably secure and robust image steganography method.
  • Figure 5: Visual results of generated stego images. All images are scaled to a suitable display size at the same ratio. (a) Ours; (b) Discop-ImageGPT ding2023discop; (c) PARIS yang2023provably.
  • ...and 1 more figures