Table of Contents
Fetching ...

PSyDUCK: Training-Free Steganography for Latent Diffusion

Aqib Mahfuz, Georgia Channing, Mark van der Wilk, Philip Torr, Fabio Pizzati, Christian Schroeder de Witt

TL;DR

PSyDUCK tackles secure, high-capacity steganography with latent diffusion models by providing a training-free, model-agnostic framework that leverages controlled divergence and local mixing in the denoising process. It extends to latent-space video diffusion, delivering superior encoding capacity and robustness over pixel-space baselines and prior latent approaches, without retraining. The work offers theoretical guarantees of bounded security error and indistinguishable-noise security, complemented by extensive image and video experiments showing high recovery accuracy and low detectability. Collectively, PSyDUCK enables practical, scalable generative steganography for real-world applications using latent diffusion models.

Abstract

Recent advances in generative AI have opened promising avenues for steganography, which can securely protect sensitive information for individuals operating in hostile environments, such as journalists, activists, and whistleblowers. However, existing methods for generative steganography have significant limitations, particularly in scalability and their dependence on retraining diffusion models. We introduce PSyDUCK, a training-free, model-agnostic steganography framework specifically designed for latent diffusion models. PSyDUCK leverages controlled divergence and local mixing within the latent denoising process, enabling high-capacity, secure message embedding without compromising visual fidelity. Our method dynamically adapts embedding strength to balance accuracy and detectability, significantly improving upon existing pixel-space approaches. Crucially, PSyDUCK extends generative steganography to latent-space video diffusion models, surpassing previous methods in both encoding capacity and robustness. Extensive experiments demonstrate PSyDUCK's superiority over state-of-the-art techniques, achieving higher transmission accuracy and lower detectability rates across diverse image and video datasets. By overcoming the key challenges associated with latent diffusion model architectures, PSyDUCK sets a new standard for generative steganography, paving the way for scalable, real-world steganographic applications.

PSyDUCK: Training-Free Steganography for Latent Diffusion

TL;DR

PSyDUCK tackles secure, high-capacity steganography with latent diffusion models by providing a training-free, model-agnostic framework that leverages controlled divergence and local mixing in the denoising process. It extends to latent-space video diffusion, delivering superior encoding capacity and robustness over pixel-space baselines and prior latent approaches, without retraining. The work offers theoretical guarantees of bounded security error and indistinguishable-noise security, complemented by extensive image and video experiments showing high recovery accuracy and low detectability. Collectively, PSyDUCK enables practical, scalable generative steganography for real-world applications using latent diffusion models.

Abstract

Recent advances in generative AI have opened promising avenues for steganography, which can securely protect sensitive information for individuals operating in hostile environments, such as journalists, activists, and whistleblowers. However, existing methods for generative steganography have significant limitations, particularly in scalability and their dependence on retraining diffusion models. We introduce PSyDUCK, a training-free, model-agnostic steganography framework specifically designed for latent diffusion models. PSyDUCK leverages controlled divergence and local mixing within the latent denoising process, enabling high-capacity, secure message embedding without compromising visual fidelity. Our method dynamically adapts embedding strength to balance accuracy and detectability, significantly improving upon existing pixel-space approaches. Crucially, PSyDUCK extends generative steganography to latent-space video diffusion models, surpassing previous methods in both encoding capacity and robustness. Extensive experiments demonstrate PSyDUCK's superiority over state-of-the-art techniques, achieving higher transmission accuracy and lower detectability rates across diverse image and video datasets. By overcoming the key challenges associated with latent diffusion model architectures, PSyDUCK sets a new standard for generative steganography, paving the way for scalable, real-world steganographic applications.

Paper Structure

This paper contains 27 sections, 5 theorems, 36 equations, 7 figures, 4 tables, 2 algorithms.

Key Result

Proposition 1

(Bounded Security Error) Let $\epsilon_\theta$ be a noise estimator that is bounded. That is, there exists a constant $C>0$ such that, for all relevant $\mathbf{x}$ and $t$, Then, under the PSyDUCK framework,

Figures (7)

  • Figure 1: General PSyDUCK scheme. Alice wants to send a secret message to Bob while preventing interception by malicious actors. To achieve this, PSyDUCK embeds a steganographic message of arbitrary length into a cover signal--an image or video--using shared keys and pre-trained latent diffusion models. The resulting stego-signal can be freely shared on the open web, allowing Bob, who possesses the correct key, to decode and retrieve the original message.
  • Figure 2: Encoding and Decoding.An illustration of the PSyDUCK encoding and decoding processes on latent model architectures. Dashed boxes denote custom PSyDUCK operations. To encode secret bitstring $\mathbf{b}$, Alice first denoises in the latent space until timestep $d$ with synchronization key $k_s$. Then, she diverges for $d$ steps using reference keys $k_i$ and subsequently mixes the diverged samples using $\mathbf{b}$. Finally, she puts her sample through the decoder to transmit a final output. To extract $\mathbf{b}$, Bob first encodes the transmission from Alice back into the latent space. He similarly denoises in the latent space until timestep $d$ with synchronization key $k_s$. Then, he diverges for $d$ steps using the reference keys $\{k_i\}_{i=0}^{r-1}$. Bob finally decodes Alice's message $\mathbf{b}$ by comparing his reference samples to Alice's transmission.
  • Figure 3: Diverging and Mixing.\ref{['fig:sub:left']} An example of mixing the red and green trajectories based on $\mathbf{b}$ to form a mixed sample when $t=1$ and $r=2$. \ref{['fig:sub:right']} Here, the blue path represents the original trajectory of the diffusion model, resulting in $\prescript{\mathcal{C}}{}{\mathbf{x}_0}$. The green path represents the divergent trajectory when conditioned with $k_0$, while the red path presents the divergent trajectory when conditioned with $k_1$. The orange path shows Alice's trajectory upon mixing samples from the two divergent paths.
  • Figure 4: Qualitative analysis of SD v2.1 stegosamples.\ref{['fig:qual-img']} Examples of stegoimages obtained by encoding the text reported at the top on the cover image with PSyDUCK and SD v2.1. Stegoimages are perceptually undistinguishable. \ref{['fig:qual-d']} Images generated from SD v2.1 using identical keys but varying divergent step count $d$. For easing visualization, we display the differences between cover and stego image magnified by $20\times$ in the bottom row.
  • Figure 5: Qualitative analysis of SVD stegasamples. We show three representative frames per video. The visual integrity of the steganographic samples remains high with no noticeable artifacts. The embedded message is content of the Zimmerman Telegraph, reproduced in Appendix \ref{['appendix:experimental-details']}.
  • ...and 2 more figures

Theorems & Definitions (14)

  • Definition 3.1: Diverged Sample
  • Definition 3.2: Local $\sf Mix$ Operation
  • Proposition 1
  • proof : Abridged Proof of Proposition \ref{['prop:err-sec-bound-formal']}
  • Proposition 2
  • proof : Abridged Proof of Proposition \ref{['prop:sto-d1-secure-formal']}
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • ...and 4 more