Table of Contents
Fetching ...

Are Language Models Agnostic to Linguistically Grounded Perturbations? A Case Study of Indic Languages

Poulami Ghosh, Raj Dabre, Pushpak Bhattacharyya

TL;DR

This study investigates whether pre-trained language models are robust to linguistically grounded perturbations across 12 Indic languages. It introduces a black-box adversarial framework that leverages phonological and orthographic perturbations to craft linguistically plausible adversaries, evaluated with LaBSE, chrF, and BERTScore on IndicXTREME tasks. Results reveal that while linguistic perturbations can reliably fool models, non-linguistic attacks generally achieve higher disruption, with variability across language families and scripts. The work highlights the importance of accounting for linguistic and script diversity in robustness research and points to adversarial training and cross-script resources as potential defenses.

Abstract

Pre-trained language models (PLMs) are known to be susceptible to perturbations to the input text, but existing works do not explicitly focus on linguistically grounded attacks, which are subtle and more prevalent in nature. In this paper, we study whether PLMs are agnostic to linguistically grounded attacks or not. To this end, we offer the first study addressing this, investigating different Indic languages and various downstream tasks. Our findings reveal that although PLMs are susceptible to linguistic perturbations, when compared to non-linguistic attacks, PLMs exhibit a slightly lower susceptibility to linguistic attacks. This highlights that even constrained attacks are effective. Moreover, we investigate the implications of these outcomes across a range of languages, encompassing diverse language families and different scripts.

Are Language Models Agnostic to Linguistically Grounded Perturbations? A Case Study of Indic Languages

TL;DR

This study investigates whether pre-trained language models are robust to linguistically grounded perturbations across 12 Indic languages. It introduces a black-box adversarial framework that leverages phonological and orthographic perturbations to craft linguistically plausible adversaries, evaluated with LaBSE, chrF, and BERTScore on IndicXTREME tasks. Results reveal that while linguistic perturbations can reliably fool models, non-linguistic attacks generally achieve higher disruption, with variability across language families and scripts. The work highlights the importance of accounting for linguistic and script diversity in robustness research and points to adversarial training and cross-script resources as potential defenses.

Abstract

Pre-trained language models (PLMs) are known to be susceptible to perturbations to the input text, but existing works do not explicitly focus on linguistically grounded attacks, which are subtle and more prevalent in nature. In this paper, we study whether PLMs are agnostic to linguistically grounded attacks or not. To this end, we offer the first study addressing this, investigating different Indic languages and various downstream tasks. Our findings reveal that although PLMs are susceptible to linguistic perturbations, when compared to non-linguistic attacks, PLMs exhibit a slightly lower susceptibility to linguistic attacks. This highlights that even constrained attacks are effective. Moreover, we investigate the implications of these outcomes across a range of languages, encompassing diverse language families and different scripts.

Paper Structure

This paper contains 27 sections, 1 equation, 4 figures, 23 tables.

Figures (4)

  • Figure 1: The substitution of b(ba) with orthographically similar v(va) in the target word b?kAr(bekaar) causes the model to misclassify the text with high confidence. This highlights the sensitivity of language models to subtle variations in the input text, where altering a single character can lead to a significant shift in the model's output. The input text is cropped to fit within the image.
  • Figure 2: Similar characters across different scripts
  • Figure 3: The figure highlights the impact of linguistic perturbations across different languages. The x-axis lists the ISO codes for the different languages, as specified in Table \ref{['tab:lang']}. Bodo (bd) and Tamil (ta) are not present across all tasks. Therefore, to maintain consistency across all tasks, we have present the remaining 10 languages in the plot. For IndicParaphrase and IndicXNLI, we illustrate the relative decrease in performance due to perturbations in sentence1 and premise, respectively.
  • Figure 4: Examples of generated adversarial text across different languages