State-of-the-art Advances of Deep-learning Linguistic Steganalysis Research
Yihao Wang, Ru Zhang, Yifan Tang, Jianyi Liu
TL;DR
This paper addresses the challenge of detecting text steganography in the face of advanced generative methods by surveying deep-learning linguistic steganalysis. It formalizes the problem with a vector-space mapping pipeline where features $F$ are produced by a network $TStegaNet$ from embedded texts $V$, and decision probabilities $P$ are obtained via a classifier. The authors classify existing work into two vector-space mappings—statistical and language-model embeddings—and four feature-extraction paradigms, then compare experimental performances across multiple datasets. Key findings show that language-model vector embeddings, especially when combined in hybrid architectures, yield higher detection accuracy at the cost of increased training time, and that several framework designs (e.g., hierarchical co-learning, semantic–syntactic preservation) can enhance robustness. The paper suggests future directions toward new learning paradigms, reduced reliance on NLP-centric features, and more interpretable, task-specific steganalysis solutions with practical impact for large-scale online content screening.
Abstract
With the evolution of generative linguistic steganography techniques, conventional steganalysis falls short in robustly quantifying the alterations induced by steganography, thereby complicating detection. Consequently, the research paradigm has pivoted towards deep-learning-based linguistic steganalysis. This study offers a comprehensive review of existing contributions and evaluates prevailing developmental trajectories. Specifically, we first provided a formalized exposition of the general formulas for linguistic steganalysis, while comparing the differences between this field and the domain of text classification. Subsequently, we classified the existing work into two levels based on vector space mapping and feature extraction models, thereby comparing the research motivations, model advantages, and other details. A comparative analysis of the experiments is conducted to assess the performances. Finally, the challenges faced by this field are discussed, and several directions for future development and key issues that urgently need to be addressed are proposed.
