Table of Contents
Fetching ...

Context-aware Adversarial Attack on Named Entity Recognition

Shuguang Chen, Leonardo Neves, Thamar Solorio

TL;DR

The paper tackles the robustness of named entity recognition (NER) models to adversarial inputs by introducing a context-aware attack that perturbs informative non-entity words. It presents a two-stage pipeline consisting of candidate selection (methods like POS tagging, dependency parsing, chunking, and gradient-based importance) and candidate replacement (synonyms via WordNet and MLM-driven substitutions with RoBERTa-base) to generate natural adversarial examples while preserving label validity. Empirical results across CoNLL03, OntoNotes5.0, and W-NUT17 show that perturbing informative words, especially using gradient/random selection paired with MLM replacements, yields larger degradation in $F_1$ scores than strong baselines, albeit with trade-offs in textual similarity. These findings highlight vulnerabilities in NER systems and provide a practical framework for adversarial auditing and subsequent defense development in real-world NLP pipelines.

Abstract

In recent years, large pre-trained language models (PLMs) have achieved remarkable performance on many natural language processing benchmarks. Despite their success, prior studies have shown that PLMs are vulnerable to attacks from adversarial examples. In this work, we focus on the named entity recognition task and study context-aware adversarial attack methods to examine the model's robustness. Specifically, we propose perturbing the most informative words for recognizing entities to create adversarial examples and investigate different candidate replacement methods to generate natural and plausible adversarial examples. Experiments and analyses show that our methods are more effective in deceiving the model into making wrong predictions than strong baselines.

Context-aware Adversarial Attack on Named Entity Recognition

TL;DR

The paper tackles the robustness of named entity recognition (NER) models to adversarial inputs by introducing a context-aware attack that perturbs informative non-entity words. It presents a two-stage pipeline consisting of candidate selection (methods like POS tagging, dependency parsing, chunking, and gradient-based importance) and candidate replacement (synonyms via WordNet and MLM-driven substitutions with RoBERTa-base) to generate natural adversarial examples while preserving label validity. Empirical results across CoNLL03, OntoNotes5.0, and W-NUT17 show that perturbing informative words, especially using gradient/random selection paired with MLM replacements, yields larger degradation in scores than strong baselines, albeit with trade-offs in textual similarity. These findings highlight vulnerabilities in NER systems and provide a practical framework for adversarial auditing and subsequent defense development in real-world NLP pipelines.

Abstract

In recent years, large pre-trained language models (PLMs) have achieved remarkable performance on many natural language processing benchmarks. Despite their success, prior studies have shown that PLMs are vulnerable to attacks from adversarial examples. In this work, we focus on the named entity recognition task and study context-aware adversarial attack methods to examine the model's robustness. Specifically, we propose perturbing the most informative words for recognizing entities to create adversarial examples and investigate different candidate replacement methods to generate natural and plausible adversarial examples. Experiments and analyses show that our methods are more effective in deceiving the model into making wrong predictions than strong baselines.
Paper Structure (16 sections, 2 figures, 3 tables)

This paper contains 16 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Comparison between adversarial attack with and without perturbing informative words.
  • Figure 2: The pipeline of the proposed context-aware adversarial attack, including candidate selection to determine which words to perturb and candidate replacement for replacing candidate words.