$\mathbf{S^2LM}$: Towards Semantic Steganography via Large Language Models

Huanqi Wu; Huangbiao Xu; Runfeng Xie; Jiaxin Cai; Kaixin Zhang; Xiao Ke

$\mathbf{S^2LM}$: Towards Semantic Steganography via Large Language Models

Huanqi Wu, Huangbiao Xu, Runfeng Xie, Jiaxin Cai, Kaixin Zhang, Xiao Ke

TL;DR

The paper defines semantic steganography and formalizes the Task of embedding sentence-level content $m$ into a cover image $I_{cover}$ to produce a stego image $I_{stego}$ with faithful recovery. It introduces S^2LM, a framework that leverages large language models to generate and decode token-based secret embeddings via two trainable MLPs, enabling high semantic payloads (up to ~$500$ words) in $256\\times256$ images, and proposes the Invisible Text (IVT) benchmark with IVT-S/M/L to evaluate semantic embedding quality and capacity. Experimental results across multiple backbones show strong semantic recovery and preserved image quality on IVT-S/M, with performance degrading gracefully for IVT-L; a practical capacity guideline of roughly two tokens per image patch is proposed. This work broadens steganography from low-level bit hiding to semantic content hiding, enabling high-capacity, content-aware data hiding and enabling future cross-modal applications while acknowledging ethical considerations.

Abstract

Although steganography has made significant advancements in recent years, it still struggles to embed semantically rich, sentence-level information into carriers. However, in the era of AIGC, the capacity of steganography is more critical than ever. In this work, we present Sentence-to-Image Steganography, an instance of Semantic Steganography, a novel task that enables the hiding of arbitrary sentence-level messages within a cover image. Furthermore, we establish a benchmark named Invisible Text (IVT), comprising a diverse set of sentence-level texts as secret messages for evaluation. Finally, we present $\mathbf{S^2LM}$: Semantic Steganographic Language Model, which utilizes large language models (LLMs) to embed high-level textual information, such as sentences or even paragraphs, into images. Unlike traditional bit-level counterparts, $\mathrm{S^2LM}$ enables the integration of semantically rich content through a newly designed pipeline in which the LLM is involved throughout the entire process. Both quantitative and qualitative experiments demonstrate that our method effectively unlocks new semantic steganographic capabilities for LLMs. The source code will be released soon.

$\mathbf{S^2LM}$: Towards Semantic Steganography via Large Language Models

TL;DR

The paper defines semantic steganography and formalizes the Task of embedding sentence-level content

into a cover image

to produce a stego image

with faithful recovery. It introduces S^2LM, a framework that leverages large language models to generate and decode token-based secret embeddings via two trainable MLPs, enabling high semantic payloads (up to ~

words) in

images, and proposes the Invisible Text (IVT) benchmark with IVT-S/M/L to evaluate semantic embedding quality and capacity. Experimental results across multiple backbones show strong semantic recovery and preserved image quality on IVT-S/M, with performance degrading gracefully for IVT-L; a practical capacity guideline of roughly two tokens per image patch is proposed. This work broadens steganography from low-level bit hiding to semantic content hiding, enabling high-capacity, content-aware data hiding and enabling future cross-modal applications while acknowledging ethical considerations.

Abstract

: Semantic Steganographic Language Model, which utilizes large language models (LLMs) to embed high-level textual information, such as sentences or even paragraphs, into images. Unlike traditional bit-level counterparts,

enables the integration of semantically rich content through a newly designed pipeline in which the LLM is involved throughout the entire process. Both quantitative and qualitative experiments demonstrate that our method effectively unlocks new semantic steganographic capabilities for LLMs. The source code will be released soon.

$\mathbf{S^2LM}$: Towards Semantic Steganography via Large Language Models

TL;DR

Abstract

$\mathbf{S^2LM}$: Towards Semantic Steganography via Large Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (20)