Harnessing Test-time Adaptation for NLU tasks Involving Dialects of English
Duke Nguyen, Aditya Joshi, Flora Salim
TL;DR
This work addresses dialectal variability in English by applying test-time domain adaptation (TTDA) with SHOT to unseen dialects. It constructs dialectally transformed GLUE data for SAE, IndE, NgE, and SingE using Multi-VALUE and evaluates SHOT against in-dialect and cross-dialect fine-tuning, revealing that SHOT consistently improves performance when labeled data are scarce and often outperforms dialect-specific training. A new concept, the dialectal gap, is proposed and shown to have a strong positive relationship with TTDA gain $TTDA_{gain}$, especially when the source dialect is SAE. The findings demonstrate a practical path toward dialect-robust natural language understanding without requiring labeled data for every dialect, with implications for deploying robust NLP systems across diverse English varieties.
Abstract
Test-time domain adaptation (TTDA) is an excellent method which helps generalize models across domains, tasks, and distributions without the use of labeled datasets. Thus, TTDA is very useful in natural language processing (NLP) in the dialectal setting, since oftentimes, models are trained on Standard American English (SAE), evaluated on Indian English (IndE), Singaporean English (SingE), or Nigerian English (NgE), of which distribution differs significantly from the former. This is especially useful since dialectal datasets are scarce. In this paper, we explore one of the most famous TTDA techniques, SHOT, in dialectal NLP. We finetune and evaluate SHOT on different combinations of dialectal GLUE. Our findings show that SHOT is a viable technique when labeled datasets are unavailable. We also theoretically propose the concept of dialectal gap and show that it has a positive correlation with the effectiveness of SHOT. We also find that in many cases, finetuning on SAE yields higher performance than finetuning on dialectal data.
