Table of Contents
Fetching ...

Investigating the Impact of Language-Adaptive Fine-Tuning on Sentiment Analysis in Hausa Language Using AfriBERTa

Sani Abdullahi Sani, Shamsuddeen Hassan Muhammad, Devon Jarvis

TL;DR

The paper tackles sentiment analysis for Hausa, a low-resource language, by applying Language-Adaptive Fine-Tuning (LAFT) to AfriBERTa. It builds a diverse unlabeled Hausa corpus from Hausa Global Media, Hausa Novel Store, and Scanned Literature, then fine-tunes AfriBERTa in a two-phase pipeline and evaluates on NaijaSenti. The results show modest gains from LAFT, partly due to the use of formal Hausa data, but confirm that a Hausa-pretrained AfriBERTa outperforms non-Hausa baselines. The work highlights the value of diverse data sources and provides open-source code and datasets to support reproducibility in low-resource African-language NLP.

Abstract

Sentiment analysis (SA) plays a vital role in Natural Language Processing (NLP) by ~identifying sentiments expressed in text. Although significant advances have been made in SA for widely spoken languages, low-resource languages such as Hausa face unique challenges, primarily due to a lack of digital resources. This study investigates the effectiveness of Language-Adaptive Fine-Tuning (LAFT) to improve SA performance in Hausa. We first curate a diverse, unlabeled corpus to expand the model's linguistic capabilities, followed by applying LAFT to adapt AfriBERTa specifically to the nuances of the Hausa language. The adapted model is then fine-tuned on the labeled NaijaSenti sentiment dataset to evaluate its performance. Our findings demonstrate that LAFT gives modest improvements, which may be attributed to the use of formal Hausa text rather than informal social media data. Nevertheless, the pre-trained AfriBERTa model significantly outperformed models not specifically trained on Hausa, highlighting the importance of using pre-trained models in low-resource contexts. This research emphasizes the necessity for diverse data sources to advance NLP applications for low-resource African languages. We published the code and the dataset to encourage further research and facilitate reproducibility in low-resource NLP here: https://github.com/Sani-Abdullahi-Sani/Natural-Language-Processing/blob/main/Sentiment%20Analysis%20for%20Low%20Resource%20African%20Languages

Investigating the Impact of Language-Adaptive Fine-Tuning on Sentiment Analysis in Hausa Language Using AfriBERTa

TL;DR

The paper tackles sentiment analysis for Hausa, a low-resource language, by applying Language-Adaptive Fine-Tuning (LAFT) to AfriBERTa. It builds a diverse unlabeled Hausa corpus from Hausa Global Media, Hausa Novel Store, and Scanned Literature, then fine-tunes AfriBERTa in a two-phase pipeline and evaluates on NaijaSenti. The results show modest gains from LAFT, partly due to the use of formal Hausa data, but confirm that a Hausa-pretrained AfriBERTa outperforms non-Hausa baselines. The work highlights the value of diverse data sources and provides open-source code and datasets to support reproducibility in low-resource African-language NLP.

Abstract

Sentiment analysis (SA) plays a vital role in Natural Language Processing (NLP) by ~identifying sentiments expressed in text. Although significant advances have been made in SA for widely spoken languages, low-resource languages such as Hausa face unique challenges, primarily due to a lack of digital resources. This study investigates the effectiveness of Language-Adaptive Fine-Tuning (LAFT) to improve SA performance in Hausa. We first curate a diverse, unlabeled corpus to expand the model's linguistic capabilities, followed by applying LAFT to adapt AfriBERTa specifically to the nuances of the Hausa language. The adapted model is then fine-tuned on the labeled NaijaSenti sentiment dataset to evaluate its performance. Our findings demonstrate that LAFT gives modest improvements, which may be attributed to the use of formal Hausa text rather than informal social media data. Nevertheless, the pre-trained AfriBERTa model significantly outperformed models not specifically trained on Hausa, highlighting the importance of using pre-trained models in low-resource contexts. This research emphasizes the necessity for diverse data sources to advance NLP applications for low-resource African languages. We published the code and the dataset to encourage further research and facilitate reproducibility in low-resource NLP here: https://github.com/Sani-Abdullahi-Sani/Natural-Language-Processing/blob/main/Sentiment%20Analysis%20for%20Low%20Resource%20African%20Languages
Paper Structure (20 sections, 5 figures, 6 tables)

This paper contains 20 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Experimental Overview: Assessing the Impact of the Intermediate LAFT in a Two-Phase Method for Hausa Sentiment Analysis
  • Figure 2: Confusion Matrix for Downstream Task before LAFT (Baseline Model on the left), and after LAFT (on the right)
  • Figure 3: LAFT Training and Validation Loss curve across five epochs showing a consistent reduction, indicating effective learning. However, by the fifth epoch, the validation loss begins to rise slightly, suggesting a potential sign of overfitting
  • Figure 4: Training and Validation Loss for the Downstream Task before and after LAFT. The graph indicate that the model after LAFT (to the right) demonstrates effective learning, beginning with lower training loss compared to the baseline model before LAFT (to the left), highlighting the benefits of the fine-tuning process
  • Figure 5: Attention Map Highlighting Key Phrases in Sentiment Analysis with strong focus on 'Allah ya isa' Indicating Negative Sentiment. On the right side, showing the model attending to the phrase "farin ciki" (glad) and "da" (that), demonstrating its capability to effectively capture positive sentiment in the text