VertAttack: Taking advantage of Text Classifiers' horizontal vision

Jonathan Rusert

VertAttack: Taking advantage of Text Classifiers' horizontal vision

Jonathan Rusert

TL;DR

This work identifies a vulnerability in text classifiers arising from their inability to read vertical text. It proposes VertAttack, a two-stage adversarial attack that greedily selects information-rich words and rewrites them vertically to fool classifiers while preserving meaning for humans. Through experiments on 5 datasets and 4 transformer models, it shows substantial accuracy degradation and transferability, corroborated by a human study that preserves readability. It also analyzes defenses (whitespace-based preprocessing, reverse reconstruction) and enhances the attack with chaff, revealing implications for OCR pipelines and robustness research.

Abstract

Text classification systems have continuously improved in performance over the years. However, nearly all current SOTA classifiers have a similar shortcoming, they process text in a horizontal manner. Vertically written words will not be recognized by a classifier. In contrast, humans are easily able to recognize and read words written both horizontally and vertically. Hence, a human adversary could write problematic words vertically and the meaning would still be preserved to other humans. We simulate such an attack, VertAttack. VertAttack identifies which words a classifier is reliant on and then rewrites those words vertically. We find that VertAttack is able to greatly drop the accuracy of 4 different transformer models on 5 datasets. For example, on the SST2 dataset, VertAttack is able to drop RoBERTa's accuracy from 94 to 13%. Furthermore, since VertAttack does not replace the word, meaning is easily preserved. We verify this via a human study and find that crowdworkers are able to correctly label 77% perturbed texts perturbed, compared to 81% of the original texts. We believe VertAttack offers a look into how humans might circumvent classifiers in the future and thus inspire a look into more robust algorithms.

VertAttack: Taking advantage of Text Classifiers' horizontal vision

TL;DR

Abstract

Paper Structure (26 sections, 1 equation, 4 figures, 10 tables, 3 algorithms)

This paper contains 26 sections, 1 equation, 4 figures, 10 tables, 3 algorithms.

Introduction
Threat Model
Attack Goals
Methodology
Word Selection
Word Transformation
Experimental Setup
Datasets
Classifiers
Metrics
VertAttack Results
Human Study
Comparisons with other attacks
Malicious Use - Offensive Language
Effect on OCR + Classifier
...and 11 more sections

Figures (4)

Figure 1: Examples of texts perturbed by VertAttack. Humans can still understand the vertically written words, while classifiers struggle to read.
Figure 2: VertAttack basic overview. A word to transform is first selected from the input text and then transformed vertically. The classifier assists in providing feedback in the form of class probabilities. The process is repeated until the classifier misclassifies the text.
Figure 3: Instructions shown to Amazon Mechanical Turk crowdworkers.
Figure 4: The classifiers' ability to correctly classify text as the amount of words perturbed increases. The classifier examined is BERT, when VertAttack uses BERT for feedback.

VertAttack: Taking advantage of Text Classifiers' horizontal vision

TL;DR

Abstract

VertAttack: Taking advantage of Text Classifiers' horizontal vision

Authors

TL;DR

Abstract

Table of Contents

Figures (4)