Table of Contents
Fetching ...

Human-in-the-Loop Generation of Adversarial Texts: A Case Study on Tibetan Script

Xi Cao, Yuan Sun, Jiajun Li, Quzong Gesang, Nuo Qun, Tashi Nyima

TL;DR

The paper addresses the vulnerability of DNN language models to textual adversarial attacks, especially in low-resource languages. It introduces HITL-GAT, a four-stage human-in-the-loop system that continuously evolves adversarial texts, benchmarks, and robustness evaluation alongside new models and datasets. Through a Tibetan-script case study, it establishes AdvTS as the first Tibetan adversarial robustness benchmark and open-sources the system under GPLv3, providing a practical reference for other low-resource languages. The work offers a scalable framework to enhance cross-language NLP security, explainability, and data augmentation, with broader implications for abugida-based languages in the Asia-Pacific region.

Abstract

DNN-based language models excel across various NLP tasks but remain highly vulnerable to textual adversarial attacks. While adversarial text generation is crucial for NLP security, explainability, evaluation, and data augmentation, related work remains overwhelmingly English-centric, leaving the problem of constructing high-quality and sustainable adversarial robustness benchmarks for lower-resourced languages both difficult and understudied. First, method customization for lower-resourced languages is complicated due to linguistic differences and limited resources. Second, automated attacks are prone to generating invalid or ambiguous adversarial texts. Last but not least, language models continuously evolve and may be immune to parts of previously generated adversarial texts. To address these challenges, we introduce HITL-GAT, an interactive system based on a general approach to human-in-the-loop generation of adversarial texts. Additionally, we demonstrate the utility of HITL-GAT through a case study on Tibetan script, employing three customized adversarial text generation methods and establishing its first adversarial robustness benchmark, providing a valuable reference for other lower-resourced languages.

Human-in-the-Loop Generation of Adversarial Texts: A Case Study on Tibetan Script

TL;DR

The paper addresses the vulnerability of DNN language models to textual adversarial attacks, especially in low-resource languages. It introduces HITL-GAT, a four-stage human-in-the-loop system that continuously evolves adversarial texts, benchmarks, and robustness evaluation alongside new models and datasets. Through a Tibetan-script case study, it establishes AdvTS as the first Tibetan adversarial robustness benchmark and open-sources the system under GPLv3, providing a practical reference for other low-resource languages. The work offers a scalable framework to enhance cross-language NLP security, explainability, and data augmentation, with broader implications for abugida-based languages in the Asia-Pacific region.

Abstract

DNN-based language models excel across various NLP tasks but remain highly vulnerable to textual adversarial attacks. While adversarial text generation is crucial for NLP security, explainability, evaluation, and data augmentation, related work remains overwhelmingly English-centric, leaving the problem of constructing high-quality and sustainable adversarial robustness benchmarks for lower-resourced languages both difficult and understudied. First, method customization for lower-resourced languages is complicated due to linguistic differences and limited resources. Second, automated attacks are prone to generating invalid or ambiguous adversarial texts. Last but not least, language models continuously evolve and may be immune to parts of previously generated adversarial texts. To address these challenges, we introduce HITL-GAT, an interactive system based on a general approach to human-in-the-loop generation of adversarial texts. Additionally, we demonstrate the utility of HITL-GAT through a case study on Tibetan script, employing three customized adversarial text generation methods and establishing its first adversarial robustness benchmark, providing a valuable reference for other lower-resourced languages.

Paper Structure

This paper contains 19 sections, 1 equation, 3 figures.

Figures (3)

  • Figure 1: Workflow of HITL-GAT. While a new language model, downstream dataset, or textual adversarial attack method emerges, we can enter the loop to make the adversarial robustness benchmark evolve.
  • Figure 2: Flowchart of HITL-GAT. Our system contains four stages in one pipeline: victim model construction, adversarial example generation, high-quality benchmark construction, and adversarial robustness evaluation. System outputs are highlighted in purple background. Human choices are highlighted in yellow background. Human annotation is highlighted in red background.
  • Figure 3: Screenshots of HITL-GAT.