Table of Contents
Fetching ...

Prompt Balance Matters: Understanding How Imbalanced Few-Shot Learning Affects Multilingual Sense Disambiguation in LLMs

Deshan Sumanathilaka, Nicholas Micallef, Julian Hough

TL;DR

The paper investigates how imbalanced few-shot prompting affects multilingual Word Sense Disambiguation (WSD) using GLOSSGPT across five languages and two large models. It introduces three sampling strategies (Highest, Lowest, Average Frequency Sharing) to construct few-shot prompts from BabelNet-backed knowledge, and evaluates on 300 examples per language. The results show no universal best strategy; English remains robust while multilingual performance varies by language and model, often biased toward high-frequency senses in imbalanced prompts. The study highlights the need for language-aware, balanced prompting in multilingual WSD and points to adaptive sampling and multi-agent prompting as promising future directions, with code available for replication.

Abstract

Recent advances in Large Language Models (LLMs) have significantly reshaped the landscape of Natural Language Processing (NLP). Among the various prompting techniques, few-shot prompting has gained considerable attention for its practicality and effectiveness. This study investigates how few-shot prompting strategies impact the Word Sense Disambiguation (WSD) task, particularly focusing on the biases introduced by imbalanced sample distributions. We use the GLOSSGPT prompting method, an advanced approach for English WSD, to test its effectiveness across five languages: English, German, Spanish, French, and Italian. Our results show that imbalanced few-shot examples can cause incorrect sense predictions in multilingual languages, but this issue does not appear in English. To assess model behavior, we evaluate both the GPT-4o and LLaMA-3.1-70B models and the results highlight the sensitivity of multilingual WSD to sample distribution in few-shot settings, emphasizing the need for balanced and representative prompting strategies.

Prompt Balance Matters: Understanding How Imbalanced Few-Shot Learning Affects Multilingual Sense Disambiguation in LLMs

TL;DR

The paper investigates how imbalanced few-shot prompting affects multilingual Word Sense Disambiguation (WSD) using GLOSSGPT across five languages and two large models. It introduces three sampling strategies (Highest, Lowest, Average Frequency Sharing) to construct few-shot prompts from BabelNet-backed knowledge, and evaluates on 300 examples per language. The results show no universal best strategy; English remains robust while multilingual performance varies by language and model, often biased toward high-frequency senses in imbalanced prompts. The study highlights the need for language-aware, balanced prompting in multilingual WSD and points to adaptive sampling and multi-agent prompting as promising future directions, with code available for replication.

Abstract

Recent advances in Large Language Models (LLMs) have significantly reshaped the landscape of Natural Language Processing (NLP). Among the various prompting techniques, few-shot prompting has gained considerable attention for its practicality and effectiveness. This study investigates how few-shot prompting strategies impact the Word Sense Disambiguation (WSD) task, particularly focusing on the biases introduced by imbalanced sample distributions. We use the GLOSSGPT prompting method, an advanced approach for English WSD, to test its effectiveness across five languages: English, German, Spanish, French, and Italian. Our results show that imbalanced few-shot examples can cause incorrect sense predictions in multilingual languages, but this issue does not appear in English. To assess model behavior, we evaluate both the GPT-4o and LLaMA-3.1-70B models and the results highlight the sensitivity of multilingual WSD to sample distribution in few-shot settings, emphasizing the need for balanced and representative prompting strategies.

Paper Structure

This paper contains 16 sections, 4 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Sense distribution for selected samples on each language. The order is English, German, Spanish, French and Italian
  • Figure 2: The data flow of the experiment process.
  • Figure 3: Few-shot knowledge base arrangement. For demonstration, the French branch is shown. A similar arrangement is followed for German, Italian and Spanish.