Table of Contents
Fetching ...

Harnessing Test-time Adaptation for NLU tasks Involving Dialects of English

Duke Nguyen, Aditya Joshi, Flora Salim

TL;DR

This work addresses dialectal variability in English by applying test-time domain adaptation (TTDA) with SHOT to unseen dialects. It constructs dialectally transformed GLUE data for SAE, IndE, NgE, and SingE using Multi-VALUE and evaluates SHOT against in-dialect and cross-dialect fine-tuning, revealing that SHOT consistently improves performance when labeled data are scarce and often outperforms dialect-specific training. A new concept, the dialectal gap, is proposed and shown to have a strong positive relationship with TTDA gain $TTDA_{gain}$, especially when the source dialect is SAE. The findings demonstrate a practical path toward dialect-robust natural language understanding without requiring labeled data for every dialect, with implications for deploying robust NLP systems across diverse English varieties.

Abstract

Test-time domain adaptation (TTDA) is an excellent method which helps generalize models across domains, tasks, and distributions without the use of labeled datasets. Thus, TTDA is very useful in natural language processing (NLP) in the dialectal setting, since oftentimes, models are trained on Standard American English (SAE), evaluated on Indian English (IndE), Singaporean English (SingE), or Nigerian English (NgE), of which distribution differs significantly from the former. This is especially useful since dialectal datasets are scarce. In this paper, we explore one of the most famous TTDA techniques, SHOT, in dialectal NLP. We finetune and evaluate SHOT on different combinations of dialectal GLUE. Our findings show that SHOT is a viable technique when labeled datasets are unavailable. We also theoretically propose the concept of dialectal gap and show that it has a positive correlation with the effectiveness of SHOT. We also find that in many cases, finetuning on SAE yields higher performance than finetuning on dialectal data.

Harnessing Test-time Adaptation for NLU tasks Involving Dialects of English

TL;DR

This work addresses dialectal variability in English by applying test-time domain adaptation (TTDA) with SHOT to unseen dialects. It constructs dialectally transformed GLUE data for SAE, IndE, NgE, and SingE using Multi-VALUE and evaluates SHOT against in-dialect and cross-dialect fine-tuning, revealing that SHOT consistently improves performance when labeled data are scarce and often outperforms dialect-specific training. A new concept, the dialectal gap, is proposed and shown to have a strong positive relationship with TTDA gain , especially when the source dialect is SAE. The findings demonstrate a practical path toward dialect-robust natural language understanding without requiring labeled data for every dialect, with implications for deploying robust NLP systems across diverse English varieties.

Abstract

Test-time domain adaptation (TTDA) is an excellent method which helps generalize models across domains, tasks, and distributions without the use of labeled datasets. Thus, TTDA is very useful in natural language processing (NLP) in the dialectal setting, since oftentimes, models are trained on Standard American English (SAE), evaluated on Indian English (IndE), Singaporean English (SingE), or Nigerian English (NgE), of which distribution differs significantly from the former. This is especially useful since dialectal datasets are scarce. In this paper, we explore one of the most famous TTDA techniques, SHOT, in dialectal NLP. We finetune and evaluate SHOT on different combinations of dialectal GLUE. Our findings show that SHOT is a viable technique when labeled datasets are unavailable. We also theoretically propose the concept of dialectal gap and show that it has a positive correlation with the effectiveness of SHOT. We also find that in many cases, finetuning on SAE yields higher performance than finetuning on dialectal data.

Paper Structure

This paper contains 9 sections, 1 equation, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Dialectal SHOT pipeline. Blue indicates pretrained weights on SAE of roberta-base. Red indicates dataset or weights associated with source dialect A. Green indicates dataset or weights associated with target dialect B. Yellow indicates weights associated with adaptation from dialect A to B.
  • Figure 2: Scatterplot of the dialectal gap against $\text{TTDA}_{\text{gain}}$. The correlation coefficient is 0.4169 and the p-value is 0.0305. The scatter plot shows positive correlation between the dialectal gap and $\text{TTDA}_{\text{gain}}$, and this is truer for SAE-only observations for reasons pointed out above. We also plot regression line for SAE-only data and all data points