Table of Contents
Fetching ...

Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP

Yangyi Chen, Hongcheng Gao, Ganqu Cui, Fanchao Qi, Longtao Huang, Zhiyuan Liu, Maosong Sun

TL;DR

The paper argues that adversarial NLP research, particularly in security, has muddled goals by blending security, evaluation, explainability, and augmentation. It introduces Advbench, a security-focused benchmark collection across five tasks (misinformation, disinformation, toxic, spam, sensitive information) with two datasets per task, and ROCKET, a decision-based, heuristic attack method designed to simulate real-world attacker goals. Through extensive experiments on Advbench, ROCKET demonstrates superior attack efficacy and efficiency while preserving adversarial meaning, and reveals limitations in current defense methods. The authors advocate for a standardized security-oriented research paradigm that separates attack goals from other use cases and emphasizes practical realism and reproducibility, potentially reshaping how adversarial NLP is studied and evaluated.

Abstract

Textual adversarial samples play important roles in multiple subfields of NLP research, including security, evaluation, explainability, and data augmentation. However, most work mixes all these roles, obscuring the problem definitions and research goals of the security role that aims to reveal the practical concerns of NLP models. In this paper, we rethink the research paradigm of textual adversarial samples in security scenarios. We discuss the deficiencies in previous work and propose our suggestions that the research on the Security-oriented adversarial NLP (SoadNLP) should: (1) evaluate their methods on security tasks to demonstrate the real-world concerns; (2) consider real-world attackers' goals, instead of developing impractical methods. To this end, we first collect, process, and release a security datasets collection Advbench. Then, we reformalize the task and adjust the emphasis on different goals in SoadNLP. Next, we propose a simple method based on heuristic rules that can easily fulfill the actual adversarial goals to simulate real-world attack methods. We conduct experiments on both the attack and the defense sides on Advbench. Experimental results show that our method has higher practical value, indicating that the research paradigm in SoadNLP may start from our new benchmark. All the code and data of Advbench can be obtained at \url{https://github.com/thunlp/Advbench}.

Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP

TL;DR

The paper argues that adversarial NLP research, particularly in security, has muddled goals by blending security, evaluation, explainability, and augmentation. It introduces Advbench, a security-focused benchmark collection across five tasks (misinformation, disinformation, toxic, spam, sensitive information) with two datasets per task, and ROCKET, a decision-based, heuristic attack method designed to simulate real-world attacker goals. Through extensive experiments on Advbench, ROCKET demonstrates superior attack efficacy and efficiency while preserving adversarial meaning, and reveals limitations in current defense methods. The authors advocate for a standardized security-oriented research paradigm that separates attack goals from other use cases and emphasizes practical realism and reproducibility, potentially reshaping how adversarial NLP is studied and evaluated.

Abstract

Textual adversarial samples play important roles in multiple subfields of NLP research, including security, evaluation, explainability, and data augmentation. However, most work mixes all these roles, obscuring the problem definitions and research goals of the security role that aims to reveal the practical concerns of NLP models. In this paper, we rethink the research paradigm of textual adversarial samples in security scenarios. We discuss the deficiencies in previous work and propose our suggestions that the research on the Security-oriented adversarial NLP (SoadNLP) should: (1) evaluate their methods on security tasks to demonstrate the real-world concerns; (2) consider real-world attackers' goals, instead of developing impractical methods. To this end, we first collect, process, and release a security datasets collection Advbench. Then, we reformalize the task and adjust the emphasis on different goals in SoadNLP. Next, we propose a simple method based on heuristic rules that can easily fulfill the actual adversarial goals to simulate real-world attack methods. We conduct experiments on both the attack and the defense sides on Advbench. Experimental results show that our method has higher practical value, indicating that the research paradigm in SoadNLP may start from our new benchmark. All the code and data of Advbench can be obtained at \url{https://github.com/thunlp/Advbench}.
Paper Structure (63 sections, 3 equations, 2 figures, 11 tables)

This paper contains 63 sections, 3 equations, 2 figures, 11 tables.

Figures (2)

  • Figure 1: Real-world cases of adversarial attacks. Adversarially modified content is highlighted in red.
  • Figure 2: Attack success rate under the restriction of maximum query times.