Understanding the effects of language-specific class imbalance in multilingual fine-tuning

Vincent Jung; Lonneke van der Plas

Understanding the effects of language-specific class imbalance in multilingual fine-tuning

Vincent Jung, Lonneke van der Plas

TL;DR

This work modify the traditional class weighing approach to imbalance by calculating class weights separately for each language and shows that this helps mitigate those detrimental effects of language-specific class imbalance in multilingual fine-tuning.

Abstract

We study the effect of one type of imbalance often present in real-life multilingual classification datasets: an uneven distribution of labels across languages. We show evidence that fine-tuning a transformer-based Large Language Model (LLM) on a dataset with this imbalance leads to worse performance, a more pronounced separation of languages in the latent space, and the promotion of uninformative features. We modify the traditional class weighing approach to imbalance by calculating class weights separately for each language and show that this helps mitigate those detrimental effects. These results create awareness of the negative effects of language-specific class imbalance in multilingual fine-tuning and the way in which the model learns to rely on the separation of languages to perform the task.

Understanding the effects of language-specific class imbalance in multilingual fine-tuning

TL;DR

Abstract

Paper Structure (21 sections, 4 equations, 3 figures, 7 tables)

This paper contains 21 sections, 4 equations, 3 figures, 7 tables.

Introduction
Methods
Text classification
Language identification
Cumulative difference in SHAP values
Per-language class weighing
Experimental setup
Results and discussion
The imbalance worsens performance
The languages are more identifiable in the latent space
The model learns to rely on non-informative tokens
Amazon reviews
XNLI
Per-language class weighing mitigates the effect of the imbalance
Conclusion
...and 6 more sections

Figures (3)

Figure 1: Average cumulative difference in SHAP value by token category for mBERT.
Figure 2: Average cumulative difference in SHAP value by token category for XLM-R with the added masked input entropy maximisation loss.
Figure 3: Average cumulative difference in SHAP value by token category for mBERT with the added masked input entropy maximisation loss.

Understanding the effects of language-specific class imbalance in multilingual fine-tuning

TL;DR

Abstract

Understanding the effects of language-specific class imbalance in multilingual fine-tuning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)