Table of Contents
Fetching ...

DiffStega: Towards Universal Training-Free Coverless Image Steganography with Diffusion Models

Yiwei Yang, Zheyuan Liu, Jun Jia, Zhongpai Gao, Yunhao Li, Wei Sun, Xiaohong Liu, Guangtao Zhai

TL;DR

DiffStega tackles universal coverless image steganography by replacing private text prompts with a password-driven reference image and a Noise Flip mechanism, enabling training-free diffusion-based hiding and recovery. A Reference Generator produces $I_{ref}$ from the password and optional controls, while Guidance Injection steers the diffusion process using $I_{ref}$ and prompt 2. The approach provides security guarantees against prompt leakage and wrong-key decryption, and a new UniStega dataset demonstrates improvements in versatility, password sensitivity, and recovery quality over prior diffusion-based CIS. Empirical results show robust performance under various prompts and degradations, with a publicly available implementation to foster reproducibility and practical adoption.

Abstract

Traditional image steganography focuses on concealing one image within another, aiming to avoid steganalysis by unauthorized entities. Coverless image steganography (CIS) enhances imperceptibility by not using any cover image. Recent works have utilized text prompts as keys in CIS through diffusion models. However, this approach faces three challenges: invalidated when private prompt is guessed, crafting public prompts for semantic diversity, and the risk of prompt leakage during frequent transmission. To address these issues, we propose DiffStega, an innovative training-free diffusion-based CIS strategy for universal application. DiffStega uses a password-dependent reference image as an image prompt alongside the text, ensuring that only authorized parties can retrieve the hidden information. Furthermore, we develop Noise Flip technique to further secure the steganography against unauthorized decryption. To comprehensively assess our method across general CIS tasks, we create a dataset comprising various image steganography instances. Experiments indicate substantial improvements in our method over existing ones, particularly in aspects of versatility, password sensitivity, and recovery quality. Codes are available at \url{https://github.com/evtricks/DiffStega}.

DiffStega: Towards Universal Training-Free Coverless Image Steganography with Diffusion Models

TL;DR

DiffStega tackles universal coverless image steganography by replacing private text prompts with a password-driven reference image and a Noise Flip mechanism, enabling training-free diffusion-based hiding and recovery. A Reference Generator produces from the password and optional controls, while Guidance Injection steers the diffusion process using and prompt 2. The approach provides security guarantees against prompt leakage and wrong-key decryption, and a new UniStega dataset demonstrates improvements in versatility, password sensitivity, and recovery quality over prior diffusion-based CIS. Empirical results show robust performance under various prompts and degradations, with a publicly available implementation to foster reproducibility and practical adoption.

Abstract

Traditional image steganography focuses on concealing one image within another, aiming to avoid steganalysis by unauthorized entities. Coverless image steganography (CIS) enhances imperceptibility by not using any cover image. Recent works have utilized text prompts as keys in CIS through diffusion models. However, this approach faces three challenges: invalidated when private prompt is guessed, crafting public prompts for semantic diversity, and the risk of prompt leakage during frequent transmission. To address these issues, we propose DiffStega, an innovative training-free diffusion-based CIS strategy for universal application. DiffStega uses a password-dependent reference image as an image prompt alongside the text, ensuring that only authorized parties can retrieve the hidden information. Furthermore, we develop Noise Flip technique to further secure the steganography against unauthorized decryption. To comprehensively assess our method across general CIS tasks, we create a dataset comprising various image steganography instances. Experiments indicate substantial improvements in our method over existing ones, particularly in aspects of versatility, password sensitivity, and recovery quality. Codes are available at \url{https://github.com/evtricks/DiffStega}.
Paper Structure (38 sections, 1 equation, 7 figures, 3 tables)

This paper contains 38 sections, 1 equation, 7 figures, 3 tables.

Figures (7)

  • Figure 1: In this scenario, Alice represents a military organization that Eve regards as a target for espionage. Instead of using text prompt 1 as private key for diffusion-based CIS like previous work (CRoSS), DiffStega uses pre-determined password as private key, and null-text as prompt 1. DiffStega has no risk of text prompt leakage, and can encrypt the original image with arbitrary prompts.
  • Figure 2: The pipeline of the DiffStega. (a) We use text prompts and ${I}_{ref}$ generated by RefGen with password $\mathcal{P}_{crt}$ to guide the diffusion process of hiding stage. The text prompts and the optional control image ${I}_{ctrl}$ (e.g. OpenPose bone image) is set public. (b) With public resources, authenticated parties could reproduce the same ${I}_{ref}$ with $\mathcal{P}_{crt}$ to guide the diffusion process of recovery stage with text prompts. (c) It illustrates the scenario where attackers attempt to directly recover the image without any password. (c) Wrong password $\mathcal{P}_{wrg}$ would result in wrong ${I}_{ref}$, which is distinct from the correct reference image, resulting in misleading the recovery diffusion process. Green / Red denotes the correct / wrong decrypted items. For brevity, we omit the encoder and decoder of VAE for latent diffusion models.
  • Figure 3: Details of Reference Generator. It first generates a deterministic initial Gaussian noise according to given password. $I_{ref}$ is generated from pretrained diffusion models with the guidance of $I_{ctrl}$ with ControlNet and prompt 2.
  • Figure 4: The visual comparison of DiffStega and CRoSS families on UniStega dataset with different prompts. For DiffStega (ours), we categorize the recovery it into three possible scenarios. Besides recovering with correct private key, malicious attackers may attempt recovery without any password, ignoring the use of Guidance Injection and Noise Flip, or recovery with the wrong password $\mathcal{P}_{wrg}$. Note that although we display prompt 1 bellow images, DiffStega still uses null-text as prompt 1 instead. CRoSS* uses two diffusion models consistent with DiffStega rather than a single model in CRoSS.
  • Figure 5: Deep steganalysis accuracy by XuNet. As the rate of leaked samples increases, the closer the curve approximates 50%, the more secure the method is. Diffusion-based methods are similar.
  • ...and 2 more figures