Table of Contents
Fetching ...

Deceiving Question-Answering Models: A Hybrid Word-Level Adversarial Approach

Jiyao Li, Mingze Ni, Yongshun Gong, Wei Liu

TL;DR

QA-Attack (Question Answering Attack), a novel word-level adversarial strategy that fools QA models and surpasses existing adversarial techniques regarding success rate, semantics changes, BLEU score, fluency and grammar error rate is introduced.

Abstract

Deep learning underpins most of the currently advanced natural language processing (NLP) tasks such as textual classification, neural machine translation (NMT), abstractive summarization and question-answering (QA). However, the robustness of the models, particularly QA models, against adversarial attacks is a critical concern that remains insufficiently explored. This paper introduces QA-Attack (Question Answering Attack), a novel word-level adversarial strategy that fools QA models. Our attention-based attack exploits the customized attention mechanism and deletion ranking strategy to identify and target specific words within contextual passages. It creates deceptive inputs by carefully choosing and substituting synonyms, preserving grammatical integrity while misleading the model to produce incorrect responses. Our approach demonstrates versatility across various question types, particularly when dealing with extensive long textual inputs. Extensive experiments on multiple benchmark datasets demonstrate that QA-Attack successfully deceives baseline QA models and surpasses existing adversarial techniques regarding success rate, semantics changes, BLEU score, fluency and grammar error rate.

Deceiving Question-Answering Models: A Hybrid Word-Level Adversarial Approach

TL;DR

QA-Attack (Question Answering Attack), a novel word-level adversarial strategy that fools QA models and surpasses existing adversarial techniques regarding success rate, semantics changes, BLEU score, fluency and grammar error rate is introduced.

Abstract

Deep learning underpins most of the currently advanced natural language processing (NLP) tasks such as textual classification, neural machine translation (NMT), abstractive summarization and question-answering (QA). However, the robustness of the models, particularly QA models, against adversarial attacks is a critical concern that remains insufficiently explored. This paper introduces QA-Attack (Question Answering Attack), a novel word-level adversarial strategy that fools QA models. Our attention-based attack exploits the customized attention mechanism and deletion ranking strategy to identify and target specific words within contextual passages. It creates deceptive inputs by carefully choosing and substituting synonyms, preserving grammatical integrity while misleading the model to produce incorrect responses. Our approach demonstrates versatility across various question types, particularly when dealing with extensive long textual inputs. Extensive experiments on multiple benchmark datasets demonstrate that QA-Attack successfully deceives baseline QA models and surpasses existing adversarial techniques regarding success rate, semantics changes, BLEU score, fluency and grammar error rate.

Paper Structure

This paper contains 29 sections, 1 equation, 5 figures, 13 tables, 1 algorithm.

Figures (5)

  • Figure 1: The workflow of our QA-Attack algorithm for QA models. It processes question-context pairs through two parallel modules: Attention-based Ranking (ABR) and Removal-based Ranking (RBR). These modules generate attention and removal scores respectively for each word using customized attention mechanisms and removal ranking strategies. The scores are then aggregated, and the $top_{k}$ highest-scoring words are selected as candidates. Finally, these candidates are replaced with BERT-generated synonyms to create adversarial examples that can effectively mislead the QA model.
  • Figure 2: F1 score analysis for HFR, ABR, and RBR variants of QA-Attack using different $top_k$ values, tested on datasets SQuAD 1.1 and BoolQ.
  • Figure 3: The performance of the T5 model re-trained on the SQuAD 1.1 dataset with mixed adversarial examples generated by TASA TASA, TMYC wallace2019trick, RobustQA robustmultilingual, T3 t3, and our QA-Attack.
  • Figure 4: F1 scores of attacking T5 models retrained with increasing proportions of adversarial examples generated by baseline methods (TASA TASA, TMYC wallace2019trick, RobustQA robustmultilingual, T3 t3) and our QA-Attack.
  • Figure 5: F1 scores for transfer attacks on three other QA models using adversarial samples generated for UnifiedQA. A lower value indicates better performance.