Pseudo-label Based Domain Adaptation for Zero-Shot Text Steganalysis

Yufei Luo; Zhen Yang; Ru Zhang; Jianyi Liu

Pseudo-label Based Domain Adaptation for Zero-Shot Text Steganalysis

Yufei Luo, Zhen Yang, Ru Zhang, Jianyi Liu

TL;DR

Zero-shot cross-domain text steganalysis faces challenges from limited labeled data and domain shift across corpora. The authors propose PDTS, a two-stage framework that combines a BERT-based domain-agnostic feature extractor with a single-layer Bi-LSTM domain-specific extractor, a feature filtration network, and a classifier. Training uses labeled source-domain data for pre-training and unlabeled target-domain data with progressively expanded pseudo-labels (expansion parameter $p$, default $p=0.1$) for fine-tuning. Empirical results on Twitter, Movie, and News datasets show PDTS achieves higher detection accuracy and F1 than MDA and SANet, particularly at higher embedding rates, indicating strong zero-shot transfer and robustness. This work reduces reliance on labeled target-domain data and offers a practical approach for real-world text steganalysis under domain shift.

Abstract

Currently, most methods for text steganalysis are based on deep neural networks (DNNs). However, in real-life scenarios, obtaining a sufficient amount of labeled stego-text for correctly training networks using a large number of parameters is often challenging and costly. Additionally, due to a phenomenon known as dataset bias or domain shift, recognition models trained on a large dataset exhibit poor generalization performance on novel datasets and tasks. Therefore, to address the issues of missing labeled data and inadequate model generalization in text steganalysis, this paper proposes a cross-domain stego-text analysis method (PDTS) based on pseudo-labeling and domain adaptation (unsupervised learning). Specifically, we propose a model architecture combining pre-trained BERT with a single-layer Bi-LSTM to learn and extract generic features across tasks and generate task-specific representations. Considering the differential contributions of different features to steganalysis, we further design a feature filtering mechanism to achieve selective feature propagation, thereby enhancing classification performance. We train the model using labeled source domain data and adapt it to target domain data distribution using pseudo-labels for unlabeled target domain data through self-training. In the label estimation step, instead of using a static sampling strategy, we propose a progressive sampling strategy to gradually increase the number of selected pseudo-label candidates. Experimental results demonstrate that our method performs well in zero-shot text steganalysis tasks, achieving high detection accuracy even in the absence of labeled data in the target domain, and outperforms current zero-shot text steganalysis methods.

Pseudo-label Based Domain Adaptation for Zero-Shot Text Steganalysis

TL;DR

Abstract

Pseudo-label Based Domain Adaptation for Zero-Shot Text Steganalysis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)

Theorems & Definitions (1)