Table of Contents
Fetching ...

SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation

Saurabh Kumar Pandey, Sachin Vashistha, Debrup Das, Somak Aditya, Monojit Choudhury

TL;DR

SMAB tackles the scalability bottleneck of sensitivity-based analysis in sequence classification by introducing a two-level multi-armed bandit that estimates $L_w$ (local sensitivity) and $G_t^w$ (global sensitivity) for words without access to weights or gold labels. It combines a sample-replace-predict perturbation strategy with Thompson Sampling to allocate exploration across words and sentences, achieving a scalable time complexity of $O(T\cdot (|\Sigma| + |D|\cdot |V|\cdot cost(f)))$. The authors validate SMAB on a CheckList case study, demonstrate that global sensitivities reveal high- and low-sensitivity words, and show that sensitivity distributions correlate with accuracy across languages, enabling an unsupervised proxy for accuracy. They further show sensitivity-guided adversarial attacks—via PromptAttack and ParaphraseAttack—improve attack effectiveness and are evaluated with human quality assessments, highlighting practical implications for robust evaluation and potential ethical considerations.

Abstract

To understand the complexity of sequence classification tasks, Hahn et al. (2021) proposed sensitivity as the number of disjoint subsets of the input sequence that can each be individually changed to change the output. Though effective, calculating sensitivity at scale using this framework is costly because of exponential time complexity. Therefore, we introduce a Sensitivity-based Multi-Armed Bandit framework (SMAB), which provides a scalable approach for calculating word-level local (sentence-level) and global (aggregated) sensitivities concerning an underlying text classifier for any dataset. We establish the effectiveness of our approach through various applications. We perform a case study on CHECKLIST generated sentiment analysis dataset where we show that our algorithm indeed captures intuitively high and low-sensitive words. Through experiments on multiple tasks and languages, we show that sensitivity can serve as a proxy for accuracy in the absence of gold data. Lastly, we show that guiding perturbation prompts using sensitivity values in adversarial example generation improves attack success rate by 15.58%, whereas using sensitivity as an additional reward in adversarial paraphrase generation gives a 12.00% improvement over SOTA approaches. Warning: Contains potentially offensive content.

SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation

TL;DR

SMAB tackles the scalability bottleneck of sensitivity-based analysis in sequence classification by introducing a two-level multi-armed bandit that estimates (local sensitivity) and (global sensitivity) for words without access to weights or gold labels. It combines a sample-replace-predict perturbation strategy with Thompson Sampling to allocate exploration across words and sentences, achieving a scalable time complexity of . The authors validate SMAB on a CheckList case study, demonstrate that global sensitivities reveal high- and low-sensitivity words, and show that sensitivity distributions correlate with accuracy across languages, enabling an unsupervised proxy for accuracy. They further show sensitivity-guided adversarial attacks—via PromptAttack and ParaphraseAttack—improve attack effectiveness and are evaluated with human quality assessments, highlighting practical implications for robust evaluation and potential ethical considerations.

Abstract

To understand the complexity of sequence classification tasks, Hahn et al. (2021) proposed sensitivity as the number of disjoint subsets of the input sequence that can each be individually changed to change the output. Though effective, calculating sensitivity at scale using this framework is costly because of exponential time complexity. Therefore, we introduce a Sensitivity-based Multi-Armed Bandit framework (SMAB), which provides a scalable approach for calculating word-level local (sentence-level) and global (aggregated) sensitivities concerning an underlying text classifier for any dataset. We establish the effectiveness of our approach through various applications. We perform a case study on CHECKLIST generated sentiment analysis dataset where we show that our algorithm indeed captures intuitively high and low-sensitive words. Through experiments on multiple tasks and languages, we show that sensitivity can serve as a proxy for accuracy in the absence of gold data. Lastly, we show that guiding perturbation prompts using sensitivity values in adversarial example generation improves attack success rate by 15.58%, whereas using sensitivity as an additional reward in adversarial paraphrase generation gives a 12.00% improvement over SOTA approaches. Warning: Contains potentially offensive content.

Paper Structure

This paper contains 42 sections, 11 equations, 6 figures, 12 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of our SMAB framework. The outer arm consists of all words in the corpus, each linked to a set of sentences in the inner arm. $w_1$ is a word in the outer arm, and $S_{w_1}$ is the set of sentences in the inner arm that contains $w_1$. $G_t^{w_1}$ is the Global Sensitivity of word $w_1$ at step $t$. We utilize a sample-replace-predict strategy to estimate local sensitivity values $L_{w_1}$ for a word $w_1$. Here, $P_{w_1}$ is the set of predicted labels obtained after perturbing $S_{w_1}$ sentences and using the target classifier. $s_1$ is chosen randomly from $P_{w_1}$ while $s_2$ is chosen such that it has the highest reward. The local sensitivity values of a word help to update its Global Sensitivity values, which helps in better outer arm selection in the next time step.
  • Figure 2: Variation of SASR with sensitivity threshold on CheckList test dataset for UCB and TS. For UCB, words are only present in bins (0-0.1) and (0.9-1.0), hence SASR becomes constant after 0.1. It shows that Thompson Sampling proves to be a better sampling strategy for this task as compared to UCB.
  • Figure 3: KL Divergence v/s accuracy across languages of mHate dataset using mBERT.
  • Figure 4: KL Divergence v/s accuracy across languages of XNLI dataset using mDeBERTa.
  • Figure 5: Scatter plot of estimated global sensitivities of arms of INV and DIR templates using TS. Words from DIR templates have higher estimated global sensitivity and are spread in the whole space as opposed to words from INV templates.
  • ...and 1 more figures