TSCheater: Generating High-Quality Tibetan Adversarial Texts via Visual Similarity
Xi Cao, Quzong Gesang, Yuan Sun, Nuo Qun, Tashi Nyima
TL;DR
This work addresses the vulnerability of Tibetan language models to textual adversarial attacks by incorporating Tibetan script-specific features. It introduces TSCheater, a method that leverages a Tibetan syllable visual similarity database (TSVSDB) to generate visually consistent substitutions and a greedy scoring mechanism to order substitutions, with formalization of the attack as $x' = x + δ$ under $||δ||_∞ ≤ ε$ and optimization of $P(y|x')$. The authors build AdvTS, a first Tibetan adversarial robustness benchmark, and demonstrate that TSCheater outperforms baselines across attack effectiveness, perturbation magnitude, semantic and visual similarity, and human acceptance, with transferability to other abugidas like Devanagari. The work provides publicly available resources (TSVSDB, TSCheater, AdvTS) and highlights the practical significance for security and robustness in low-resource, cross-border language settings, calling for broader evaluation of adversarial robustness in such languages.
Abstract
Language models based on deep neural networks are vulnerable to textual adversarial attacks. While rich-resource languages like English are receiving focused attention, Tibetan, a cross-border language, is gradually being studied due to its abundant ancient literature and critical language strategy. Currently, there are several Tibetan adversarial text generation methods, but they do not fully consider the textual features of Tibetan script and overestimate the quality of generated adversarial texts. To address this issue, we propose a novel Tibetan adversarial text generation method called TSCheater, which considers the characteristic of Tibetan encoding and the feature that visually similar syllables have similar semantics. This method can also be transferred to other abugidas, such as Devanagari script. We utilize a self-constructed Tibetan syllable visual similarity database called TSVSDB to generate substitution candidates and adopt a greedy algorithm-based scoring mechanism to determine substitution order. After that, we conduct the method on eight victim language models. Experimentally, TSCheater outperforms existing methods in attack effectiveness, perturbation magnitude, semantic similarity, visual similarity, and human acceptance. Finally, we construct the first Tibetan adversarial robustness evaluation benchmark called AdvTS, which is generated by existing methods and proofread by humans.
