Table of Contents
Fetching ...

Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems

Steffen Eger, Gözde Gül Şahin, Andreas Rücklé, Ji-Ung Lee, Claudia Schulz, Mohsen Mesgar, Krishnkant Swarnkar, Edwin Simpson, Iryna Gurevych

TL;DR

<3-5 sentence high-level summary>

Abstract

Visual modifications to text are often used to obfuscate offensive comments in social media (e.g., "!d10t") or as a writing style ("1337" in "leet speak"), among other scenarios. We consider this as a new type of adversarial attack in NLP, a setting to which humans are very robust, as our experiments with both simple and more difficult visual input perturbations demonstrate. We then investigate the impact of visual adversarial attacks on current NLP systems on character-, word-, and sentence-level tasks, showing that both neural and non-neural models are, in contrast to humans, extremely sensitive to such attacks, suffering performance decreases of up to 82\%. We then explore three shielding methods---visual character embeddings, adversarial training, and rule-based recovery---which substantially improve the robustness of the models. However, the shielding methods still fall behind performances achieved in non-attack scenarios, which demonstrates the difficulty of dealing with visual attacks.

Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems

TL;DR

<3-5 sentence high-level summary>

Abstract

Visual modifications to text are often used to obfuscate offensive comments in social media (e.g., "!d10t") or as a writing style ("1337" in "leet speak"), among other scenarios. We consider this as a new type of adversarial attack in NLP, a setting to which humans are very robust, as our experiments with both simple and more difficult visual input perturbations demonstrate. We then investigate the impact of visual adversarial attacks on current NLP systems on character-, word-, and sentence-level tasks, showing that both neural and non-neural models are, in contrast to humans, extremely sensitive to such attacks, suffering performance decreases of up to 82\%. We then explore three shielding methods---visual character embeddings, adversarial training, and rule-based recovery---which substantially improve the robustness of the models. However, the shielding methods still fall behind performances achieved in non-attack scenarios, which demonstrates the difficulty of dealing with visual attacks.

Paper Structure

This paper contains 38 sections, 5 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Human annotation experiment. Error bars indicate std. across annotators. For easy, we merge the cases $p=0.4/0.8$.
  • Figure 2: Degradation of SOTA systems for different perturbation levels when attacked by VIPER($p$,DCES). The colored regions show how the performance of other SOTA systems relate to ours (i.e., they all suffer from similar degradation).
  • Figure 3: AT (with ICES replacements) and CE tested on DCES perturbed data. The colored regions show AT (with random replacements).
  • Figure 4: AT$+$CE (with ICES replacements) and RBR on DCES perturbed data. The colored regions show AT (with random replacements).
  • Figure 5: Degradation of SOTA systems for different perturbation levels when attacked by VIPER($p$,ECES). The colored regions show how the performance of other SOTA systems relate to ours.
  • ...and 4 more figures