Table of Contents
Fetching ...

Enhancing BERT Fine-Tuning for Sentiment Analysis in Lower-Resourced Languages

Jozef Kubík, Marek Šuppa, Martin Takáč

TL;DR

The paper tackles data scarcity in low-resource languages by focusing on data-efficient fine-tuning of BERT models for sentiment analysis. It introduces an integrated pipeline that combines Epistemic Neural Networks (via Epinet), Active Learning (entropy, BALD, variance), clustering (Agglomerative Ward), and scheduling strategies to optimize annotation usage. Empirical results across Slovak, Icelandic, Maltese, and Turkish show substantial annotation savings and performance gains (up to ~4 F1 points) with improved stability, suggesting broad applicability to other classification tasks and languages. The approach demonstrates that careful orchestration of data selection, uncertainty modeling, and sampling schedules can meaningfully close the gap between high- and low-resource language performance in LM fine-tuning.

Abstract

Limited data for low-resource languages typically yield weaker language models (LMs). Since pre-training is compute-intensive, it is more pragmatic to target improvements during fine-tuning. In this work, we examine the use of Active Learning (AL) methods augmented by structured data selection strategies which we term 'Active Learning schedulers', to boost the fine-tuning process with a limited amount of training data. We connect the AL to data clustering and propose an integrated fine-tuning pipeline that systematically combines AL, clustering, and dynamic data selection schedulers to enhance model's performance. Experiments in the Slovak, Maltese, Icelandic and Turkish languages show that the use of clustering during the fine-tuning phase together with AL scheduling can simultaneously produce annotation savings up to 30% and performance improvements up to four F1 score points, while also providing better fine-tuning stability.

Enhancing BERT Fine-Tuning for Sentiment Analysis in Lower-Resourced Languages

TL;DR

The paper tackles data scarcity in low-resource languages by focusing on data-efficient fine-tuning of BERT models for sentiment analysis. It introduces an integrated pipeline that combines Epistemic Neural Networks (via Epinet), Active Learning (entropy, BALD, variance), clustering (Agglomerative Ward), and scheduling strategies to optimize annotation usage. Empirical results across Slovak, Icelandic, Maltese, and Turkish show substantial annotation savings and performance gains (up to ~4 F1 points) with improved stability, suggesting broad applicability to other classification tasks and languages. The approach demonstrates that careful orchestration of data selection, uncertainty modeling, and sampling schedules can meaningfully close the gap between high- and low-resource language performance in LM fine-tuning.

Abstract

Limited data for low-resource languages typically yield weaker language models (LMs). Since pre-training is compute-intensive, it is more pragmatic to target improvements during fine-tuning. In this work, we examine the use of Active Learning (AL) methods augmented by structured data selection strategies which we term 'Active Learning schedulers', to boost the fine-tuning process with a limited amount of training data. We connect the AL to data clustering and propose an integrated fine-tuning pipeline that systematically combines AL, clustering, and dynamic data selection schedulers to enhance model's performance. Experiments in the Slovak, Maltese, Icelandic and Turkish languages show that the use of clustering during the fine-tuning phase together with AL scheduling can simultaneously produce annotation savings up to 30% and performance improvements up to four F1 score points, while also providing better fine-tuning stability.

Paper Structure

This paper contains 23 sections, 7 figures, 18 tables.

Figures (7)

  • Figure 1: Data used in the fine-tuning of the models.
  • Figure 2: Data used in the fine-tuning of the models reaching baseline performance while using fewest data samples possible.
  • Figure 3: Learning curves of SlovakBERT models.
  • Figure 4: Learning curves of IceBERT models with different acquistion functions.
  • Figure 5: Learning curves of baseline model and the best performing AL models while fine-tuning BERTu.
  • ...and 2 more figures