Table of Contents
Fetching ...

A Content-Preserving Secure Linguistic Steganography

Lingyun Xiang, Chengfu Ou, Xu He, Zhongliang Yang, Yuling Liu

TL;DR

The paper tackles the security limitations of traditional linguistic steganography by proposing a content-preserving paradigm that embeds messages without altering cover text. It introduces CLstega, a method that uses augmented masking, dynamic distribution steganographic coding, and controllable distribution transformation via fine-tuning a masked language model to map secret messages to target prediction distributions. This approach achieves near-perfect security, as evidenced by extraction reliability (ESR) reaching 100% and anti-steganalysis results approaching random chance, while maintaining competitive embedding capacity and superior imperceptibility. The work offers a practical pathway for perfectly secure covert communication in natural language without content modification, with detailed methodology and comprehensive experiments.

Abstract

Existing linguistic steganography methods primarily rely on content transformations to conceal secret messages. However, they often cause subtle yet looking-innocent deviations between normal and stego texts, posing potential security risks in real-world applications. To address this challenge, we propose a content-preserving linguistic steganography paradigm for perfectly secure covert communication without modifying the cover text. Based on this paradigm, we introduce CLstega (\textit{C}ontent-preserving \textit{L}inguistic \textit{stega}nography), a novel method that embeds secret messages through controllable distribution transformation. CLstega first applies an augmented masking strategy to locate and mask embedding positions, where MLM(masked language model)-predicted probability distributions are easily adjustable for transformation. Subsequently, a dynamic distribution steganographic coding strategy is designed to encode secret messages by deriving target distributions from the original probability distributions. To achieve this transformation, CLstega elaborately selects target words for embedding positions as labels to construct a masked sentence dataset, which is used to fine-tune the original MLM, producing a target MLM capable of directly extracting secret messages from the cover text. This approach ensures perfect security of secret messages while fully preserving the integrity of the original cover text. Experimental results show that CLstega can achieve a 100\% extraction success rate, and outperforms existing methods in security, effectively balancing embedding capacity and security.

A Content-Preserving Secure Linguistic Steganography

TL;DR

The paper tackles the security limitations of traditional linguistic steganography by proposing a content-preserving paradigm that embeds messages without altering cover text. It introduces CLstega, a method that uses augmented masking, dynamic distribution steganographic coding, and controllable distribution transformation via fine-tuning a masked language model to map secret messages to target prediction distributions. This approach achieves near-perfect security, as evidenced by extraction reliability (ESR) reaching 100% and anti-steganalysis results approaching random chance, while maintaining competitive embedding capacity and superior imperceptibility. The work offers a practical pathway for perfectly secure covert communication in natural language without content modification, with detailed methodology and comprehensive experiments.

Abstract

Existing linguistic steganography methods primarily rely on content transformations to conceal secret messages. However, they often cause subtle yet looking-innocent deviations between normal and stego texts, posing potential security risks in real-world applications. To address this challenge, we propose a content-preserving linguistic steganography paradigm for perfectly secure covert communication without modifying the cover text. Based on this paradigm, we introduce CLstega (\textit{C}ontent-preserving \textit{L}inguistic \textit{stega}nography), a novel method that embeds secret messages through controllable distribution transformation. CLstega first applies an augmented masking strategy to locate and mask embedding positions, where MLM(masked language model)-predicted probability distributions are easily adjustable for transformation. Subsequently, a dynamic distribution steganographic coding strategy is designed to encode secret messages by deriving target distributions from the original probability distributions. To achieve this transformation, CLstega elaborately selects target words for embedding positions as labels to construct a masked sentence dataset, which is used to fine-tune the original MLM, producing a target MLM capable of directly extracting secret messages from the cover text. This approach ensures perfect security of secret messages while fully preserving the integrity of the original cover text. Experimental results show that CLstega can achieve a 100\% extraction success rate, and outperforms existing methods in security, effectively balancing embedding capacity and security.

Paper Structure

This paper contains 32 sections, 11 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Frameworks of LS paradigms. Red solid arrows indicate embedding, blue dashed arrows indicate extraction.
  • Figure 2: The overall framework of the proposed content-preserving linguistic steganography (CLstega).
  • Figure 3: Extraction success rate for different masking strategies and numbers of embedding positions $k$.
  • Figure 4: Key challenges and solution insight for content-preserving LS paradigm