Investigating the Impact of Language-Adaptive Fine-Tuning on Sentiment Analysis in Hausa Language Using AfriBERTa
Sani Abdullahi Sani, Shamsuddeen Hassan Muhammad, Devon Jarvis
TL;DR
The paper tackles sentiment analysis for Hausa, a low-resource language, by applying Language-Adaptive Fine-Tuning (LAFT) to AfriBERTa. It builds a diverse unlabeled Hausa corpus from Hausa Global Media, Hausa Novel Store, and Scanned Literature, then fine-tunes AfriBERTa in a two-phase pipeline and evaluates on NaijaSenti. The results show modest gains from LAFT, partly due to the use of formal Hausa data, but confirm that a Hausa-pretrained AfriBERTa outperforms non-Hausa baselines. The work highlights the value of diverse data sources and provides open-source code and datasets to support reproducibility in low-resource African-language NLP.
Abstract
Sentiment analysis (SA) plays a vital role in Natural Language Processing (NLP) by ~identifying sentiments expressed in text. Although significant advances have been made in SA for widely spoken languages, low-resource languages such as Hausa face unique challenges, primarily due to a lack of digital resources. This study investigates the effectiveness of Language-Adaptive Fine-Tuning (LAFT) to improve SA performance in Hausa. We first curate a diverse, unlabeled corpus to expand the model's linguistic capabilities, followed by applying LAFT to adapt AfriBERTa specifically to the nuances of the Hausa language. The adapted model is then fine-tuned on the labeled NaijaSenti sentiment dataset to evaluate its performance. Our findings demonstrate that LAFT gives modest improvements, which may be attributed to the use of formal Hausa text rather than informal social media data. Nevertheless, the pre-trained AfriBERTa model significantly outperformed models not specifically trained on Hausa, highlighting the importance of using pre-trained models in low-resource contexts. This research emphasizes the necessity for diverse data sources to advance NLP applications for low-resource African languages. We published the code and the dataset to encourage further research and facilitate reproducibility in low-resource NLP here: https://github.com/Sani-Abdullahi-Sani/Natural-Language-Processing/blob/main/Sentiment%20Analysis%20for%20Low%20Resource%20African%20Languages
