Table of Contents
Fetching ...

$\mathbf{S^2LM}$: Towards Semantic Steganography via Large Language Models

Huanqi Wu, Huangbiao Xu, Runfeng Xie, Jiaxin Cai, Kaixin Zhang, Xiao Ke

TL;DR

The paper defines semantic steganography and formalizes the Task of embedding sentence-level content $m$ into a cover image $I_{cover}$ to produce a stego image $I_{stego}$ with faithful recovery. It introduces S^2LM, a framework that leverages large language models to generate and decode token-based secret embeddings via two trainable MLPs, enabling high semantic payloads (up to ~$500$ words) in $256\\times256$ images, and proposes the Invisible Text (IVT) benchmark with IVT-S/M/L to evaluate semantic embedding quality and capacity. Experimental results across multiple backbones show strong semantic recovery and preserved image quality on IVT-S/M, with performance degrading gracefully for IVT-L; a practical capacity guideline of roughly two tokens per image patch is proposed. This work broadens steganography from low-level bit hiding to semantic content hiding, enabling high-capacity, content-aware data hiding and enabling future cross-modal applications while acknowledging ethical considerations.

Abstract

Although steganography has made significant advancements in recent years, it still struggles to embed semantically rich, sentence-level information into carriers. However, in the era of AIGC, the capacity of steganography is more critical than ever. In this work, we present Sentence-to-Image Steganography, an instance of Semantic Steganography, a novel task that enables the hiding of arbitrary sentence-level messages within a cover image. Furthermore, we establish a benchmark named Invisible Text (IVT), comprising a diverse set of sentence-level texts as secret messages for evaluation. Finally, we present $\mathbf{S^2LM}$: Semantic Steganographic Language Model, which utilizes large language models (LLMs) to embed high-level textual information, such as sentences or even paragraphs, into images. Unlike traditional bit-level counterparts, $\mathrm{S^2LM}$ enables the integration of semantically rich content through a newly designed pipeline in which the LLM is involved throughout the entire process. Both quantitative and qualitative experiments demonstrate that our method effectively unlocks new semantic steganographic capabilities for LLMs. The source code will be released soon.

$\mathbf{S^2LM}$: Towards Semantic Steganography via Large Language Models

TL;DR

The paper defines semantic steganography and formalizes the Task of embedding sentence-level content into a cover image to produce a stego image with faithful recovery. It introduces S^2LM, a framework that leverages large language models to generate and decode token-based secret embeddings via two trainable MLPs, enabling high semantic payloads (up to ~ words) in images, and proposes the Invisible Text (IVT) benchmark with IVT-S/M/L to evaluate semantic embedding quality and capacity. Experimental results across multiple backbones show strong semantic recovery and preserved image quality on IVT-S/M, with performance degrading gracefully for IVT-L; a practical capacity guideline of roughly two tokens per image patch is proposed. This work broadens steganography from low-level bit hiding to semantic content hiding, enabling high-capacity, content-aware data hiding and enabling future cross-modal applications while acknowledging ethical considerations.

Abstract

Although steganography has made significant advancements in recent years, it still struggles to embed semantically rich, sentence-level information into carriers. However, in the era of AIGC, the capacity of steganography is more critical than ever. In this work, we present Sentence-to-Image Steganography, an instance of Semantic Steganography, a novel task that enables the hiding of arbitrary sentence-level messages within a cover image. Furthermore, we establish a benchmark named Invisible Text (IVT), comprising a diverse set of sentence-level texts as secret messages for evaluation. Finally, we present : Semantic Steganographic Language Model, which utilizes large language models (LLMs) to embed high-level textual information, such as sentences or even paragraphs, into images. Unlike traditional bit-level counterparts, enables the integration of semantically rich content through a newly designed pipeline in which the LLM is involved throughout the entire process. Both quantitative and qualitative experiments demonstrate that our method effectively unlocks new semantic steganographic capabilities for LLMs. The source code will be released soon.

Paper Structure

This paper contains 32 sections, 6 equations, 20 figures, 8 tables.

Figures (20)

  • Figure 1: Overview of our $\mathrm{S^2LM}$ framework vs. previous frameworks. We define the pipeline in three processes and show the difference between the previous methods and our $\mathrm{S^2LM}$ framework.
  • Figure 2: The pipeline of the $\mathrm{S^2LM}$ framework.
  • Figure 3: Prompt templates used in the embedding and decoding procedures of $\mathrm{S^2LM}$.
  • Figure 4: Two-stage training strategy for $\mathrm{S^2LM}$.
  • Figure 5: Qualitative results of $\mathrm{S^2LM}$-Qwen2.5-0.5B on different length of secret message.
  • ...and 15 more figures