Table of Contents
Fetching ...

GenFighter: A Generative and Evolutive Textual Attack Removal

Md Athikul Islam, Edoardo Serra, Sushil Jajodia

TL;DR

This paper tackles the vulnerability of NLP transformers to word-substitution adversarial attacks by proposing GenFighter, a distribution-aware defense that learns the training classification distribution and uses an evolutionary search over paraphrases to align out-of-distribution inputs with that distribution. It integrates a conditional paraphrase generator, a Gaussian Mixture Model anomaly detector, and an ensemble inference scheme to produce robust predictions from multiple paraphrase candidates. Empirical results across IMDB, AG News, and SST-2 show GenFighter achieves higher accuracy under attack and lower attack success rates than state-of-the-art defenses against PWWS, TextFooler, and BERT-Attack, while inducing higher query requirements for attackers. The ablation and hyperparameter analyses indicate each component—paraphrasing, anomaly detection, and evolution—contributes meaningfully to robustness and transferability, highlighting the practical impact of distribution-aware defense for NLP systems.

Abstract

Adversarial attacks pose significant challenges to deep neural networks (DNNs) such as Transformer models in natural language processing (NLP). This paper introduces a novel defense strategy, called GenFighter, which enhances adversarial robustness by learning and reasoning on the training classification distribution. GenFighter identifies potentially malicious instances deviating from the distribution, transforms them into semantically equivalent instances aligned with the training data, and employs ensemble techniques for a unified and robust response. By conducting extensive experiments, we show that GenFighter outperforms state-of-the-art defenses in accuracy under attack and attack success rate metrics. Additionally, it requires a high number of queries per attack, making the attack more challenging in real scenarios. The ablation study shows that our approach integrates transfer learning, a generative/evolutive procedure, and an ensemble method, providing an effective defense against NLP adversarial attacks.

GenFighter: A Generative and Evolutive Textual Attack Removal

TL;DR

This paper tackles the vulnerability of NLP transformers to word-substitution adversarial attacks by proposing GenFighter, a distribution-aware defense that learns the training classification distribution and uses an evolutionary search over paraphrases to align out-of-distribution inputs with that distribution. It integrates a conditional paraphrase generator, a Gaussian Mixture Model anomaly detector, and an ensemble inference scheme to produce robust predictions from multiple paraphrase candidates. Empirical results across IMDB, AG News, and SST-2 show GenFighter achieves higher accuracy under attack and lower attack success rates than state-of-the-art defenses against PWWS, TextFooler, and BERT-Attack, while inducing higher query requirements for attackers. The ablation and hyperparameter analyses indicate each component—paraphrasing, anomaly detection, and evolution—contributes meaningfully to robustness and transferability, highlighting the practical impact of distribution-aware defense for NLP systems.

Abstract

Adversarial attacks pose significant challenges to deep neural networks (DNNs) such as Transformer models in natural language processing (NLP). This paper introduces a novel defense strategy, called GenFighter, which enhances adversarial robustness by learning and reasoning on the training classification distribution. GenFighter identifies potentially malicious instances deviating from the distribution, transforms them into semantically equivalent instances aligned with the training data, and employs ensemble techniques for a unified and robust response. By conducting extensive experiments, we show that GenFighter outperforms state-of-the-art defenses in accuracy under attack and attack success rate metrics. Additionally, it requires a high number of queries per attack, making the attack more challenging in real scenarios. The ablation study shows that our approach integrates transfer learning, a generative/evolutive procedure, and an ensemble method, providing an effective defense against NLP adversarial attacks.
Paper Structure (18 sections, 2 figures, 4 tables)

This paper contains 18 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: An overview of GenFighter. The paraphraser (conditional generative textual model) module processes input texts and forwards input texts and their paraphrases $P_x$ to the anomaly detection model. The anomaly detection model assigns normality scores $S_x$ and selects top-$K$ normal texts. These selected candidate texts are either looped back into the paraphraser module or directed to the target model contingent on meeting the threshold $\tau$. Finally, the victim model performs a weighted mean prediction on the top-$K$ normal texts and scores.
  • Figure 2: Hyperparameter analysis of GenFighter against three word-substitution attacks on the AG’s News dataset. We randomly sample 100 test examples and compute the Aua% for each attack across all hyperparameters.