Detecting Backdoor Attacks via Similarity in Semantic Communication Systems

Ziyang Wei; Yili Jiang; Jiaqi Huang; Fangtian Zhong; Sohan Gyawali

Detecting Backdoor Attacks via Similarity in Semantic Communication Systems

Ziyang Wei, Yili Jiang, Jiaqi Huang, Fangtian Zhong, Sohan Gyawali

TL;DR

This work tackles backdoor attacks in semantic communication systems by introducing a threshold-based defense that relies on semantic similarity rather than altering model structure or data formats. The approach builds a clean semantic baseline from trusted data and uses Mahalanobis-like similarity to detect deviations caused by poisoned samples, with two main thresholding strategies: $T_{\max}$ and $T_{\text{mean}}$, plus percentile-based adjustments. Experiments on MNIST demonstrate high detection performance across poisoning ratios, achieving 100% recall with competitive accuracy under mean-threshold settings and robust accuracy with percentile-based thresholds. The method preserves the integrity of clean data and avoids architectural changes, offering a practical defense for real-time semantic communication deployments and enabling broader applicability to noisy or evolving threat landscapes.

Abstract

Semantic communication systems, which leverage Generative AI (GAI) to transmit semantic meaning rather than raw data, are poised to revolutionize modern communications. However, they are vulnerable to backdoor attacks, a type of poisoning manipulation that embeds malicious triggers into training datasets. As a result, Backdoor attacks mislead the inference for poisoned samples while clean samples remain unaffected. The existing defenses may alter the model structure (such as neuron pruning that potentially degrades inference performance on clean inputs, or impose strict requirements on data formats (such as ``Semantic Shield" that requires image-text pairs). To address these limitations, this work proposes a defense mechanism that leverages semantic similarity to detect backdoor attacks without modifying the model structure or imposing data format constraints. By analyzing deviations in semantic feature space and establishing a threshold-based detection framework, the proposed approach effectively identifies poisoned samples. The experimental results demonstrate high detection accuracy and recall across varying poisoning ratios, underlining the significant effectiveness of our proposed solution.

Detecting Backdoor Attacks via Similarity in Semantic Communication Systems

TL;DR

Abstract

Detecting Backdoor Attacks via Similarity in Semantic Communication Systems

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)