Adversarial Attacks and Dimensionality in Text Classifiers

Nandish Chattopadhyay; Atreya Goswami; Anupam Chattopadhyay

Adversarial Attacks and Dimensionality in Text Classifiers

Nandish Chattopadhyay, Atreya Goswami, Anupam Chattopadhyay

TL;DR

Adversarial attacks on NLP text classifiers are shown to be strongly influenced by embedding dimensionality, with attacks most effective when generated against models sharing the same embedding dimension. The authors propose a dimension-aware defense based on ensembles of models with varying embedding dims, demonstrating substantial robustness gains over single-model baselines. They quantify perturbations using distance metrics such as $L_1$, $L_2$, and $L_{\infty}$ under a bounded budget and validate the approach on IMDB and Twitter datasets using a word-level TextFooler-like attack implemented via TextAttack. The work offers a practical direction for improving NLP robustness by leveraging dimensionality diversity, with implications for deploying dimension-aware defenses in real-world text classification systems.

Abstract

Adversarial attacks on machine learning algorithms have been a key deterrent to the adoption of AI in many real-world use cases. They significantly undermine the ability of high-performance neural networks by forcing misclassifications. These attacks introduce minute and structured perturbations or alterations in the test samples, imperceptible to human annotators in general, but trained neural networks and other models are sensitive to it. Historically, adversarial attacks have been first identified and studied in the domain of image processing. In this paper, we study adversarial examples in the field of natural language processing, specifically text classification tasks. We investigate the reasons for adversarial vulnerability, particularly in relation to the inherent dimensionality of the model. Our key finding is that there is a very strong correlation between the embedding dimensionality of the adversarial samples and their effectiveness on models tuned with input samples with same embedding dimension. We utilize this sensitivity to design an adversarial defense mechanism. We use ensemble models of varying inherent dimensionality to thwart the attacks. This is tested on multiple datasets for its efficacy in providing robustness. We also study the problem of measuring adversarial perturbation using different distance metrics. For all of the aforementioned studies, we have run tests on multiple models with varying dimensionality and used a word-vector level adversarial attack to substantiate the findings.

Adversarial Attacks and Dimensionality in Text Classifiers

TL;DR

, and

under a bounded budget and validate the approach on IMDB and Twitter datasets using a word-level TextFooler-like attack implemented via TextAttack. The work offers a practical direction for improving NLP robustness by leveraging dimensionality diversity, with implications for deploying dimension-aware defenses in real-world text classification systems.

Abstract

Paper Structure (27 sections, 1 equation, 4 figures, 5 tables)

This paper contains 27 sections, 1 equation, 4 figures, 5 tables.

Introduction
Motivation
Contribution
Organization
Background and Related Works
Text Classifiers
Adversarial Examples
Literature Review
Dimensionality and Adversarial Attack
Properties of Dimensionality
Dimension Sensitivity
Ensembling
Measuring Adversarial Perturbation
Implementation
Pipeline
...and 12 more sections

Figures (4)

Figure 1: Representation diagram of the sensitivity of the models' vulnerability on embedding dimensions for adversarial vulnerability.
Figure 2: A recurrent neural network architecture for text classification tasks.
Figure 3: An adversarial attack on a text classifier neural network model.
Figure 4: Dimensionality and adversarial attacks.

Adversarial Attacks and Dimensionality in Text Classifiers

TL;DR

Abstract

Adversarial Attacks and Dimensionality in Text Classifiers

Authors

TL;DR

Abstract

Table of Contents

Figures (4)