State-of-the-art Advances of Deep-learning Linguistic Steganalysis Research

Yihao Wang; Ru Zhang; Yifan Tang; Jianyi Liu

State-of-the-art Advances of Deep-learning Linguistic Steganalysis Research

Yihao Wang, Ru Zhang, Yifan Tang, Jianyi Liu

TL;DR

This paper addresses the challenge of detecting text steganography in the face of advanced generative methods by surveying deep-learning linguistic steganalysis. It formalizes the problem with a vector-space mapping pipeline where features $F$ are produced by a network $TStegaNet$ from embedded texts $V$, and decision probabilities $P$ are obtained via a classifier. The authors classify existing work into two vector-space mappings—statistical and language-model embeddings—and four feature-extraction paradigms, then compare experimental performances across multiple datasets. Key findings show that language-model vector embeddings, especially when combined in hybrid architectures, yield higher detection accuracy at the cost of increased training time, and that several framework designs (e.g., hierarchical co-learning, semantic–syntactic preservation) can enhance robustness. The paper suggests future directions toward new learning paradigms, reduced reliance on NLP-centric features, and more interpretable, task-specific steganalysis solutions with practical impact for large-scale online content screening.

Abstract

With the evolution of generative linguistic steganography techniques, conventional steganalysis falls short in robustly quantifying the alterations induced by steganography, thereby complicating detection. Consequently, the research paradigm has pivoted towards deep-learning-based linguistic steganalysis. This study offers a comprehensive review of existing contributions and evaluates prevailing developmental trajectories. Specifically, we first provided a formalized exposition of the general formulas for linguistic steganalysis, while comparing the differences between this field and the domain of text classification. Subsequently, we classified the existing work into two levels based on vector space mapping and feature extraction models, thereby comparing the research motivations, model advantages, and other details. A comparative analysis of the experiments is conducted to assess the performances. Finally, the challenges faced by this field are discussed, and several directions for future development and key issues that urgently need to be addressed are proposed.

State-of-the-art Advances of Deep-learning Linguistic Steganalysis Research

TL;DR

are produced by a network

from embedded texts

, and decision probabilities

are obtained via a classifier. The authors classify existing work into two vector-space mappings—statistical and language-model embeddings—and four feature-extraction paradigms, then compare experimental performances across multiple datasets. Key findings show that language-model vector embeddings, especially when combined in hybrid architectures, yield higher detection accuracy at the cost of increased training time, and that several framework designs (e.g., hierarchical co-learning, semantic–syntactic preservation) can enhance robustness. The paper suggests future directions toward new learning paradigms, reduced reliance on NLP-centric features, and more interpretable, task-specific steganalysis solutions with practical impact for large-scale online content screening.

Abstract

Paper Structure (10 sections, 4 equations, 2 figures, 4 tables)

This paper contains 10 sections, 4 equations, 2 figures, 4 tables.

Introduction
Preliminaries
Design details of existing work
Overview
"Statistical vector embedding" method details
"Language-model vector embedding" method details
Linguistic steganalysis framework details
Experiments
Results and analysis
Conclusion and Outlook

Figures (2)

Figure 1: Overall workflow of deep-learning linguistic steganalysis models.
Figure 2: Classification of deep-learning linguistic steganalysis.

State-of-the-art Advances of Deep-learning Linguistic Steganalysis Research

TL;DR

Abstract

State-of-the-art Advances of Deep-learning Linguistic Steganalysis Research

Authors

TL;DR

Abstract

Table of Contents

Figures (2)