Exploring the Word Sense Disambiguation Capabilities of Large Language Models
Pierpaolo Basile, Lucia Siciliani, Elio Musacchio, Giovanni Semeraro
TL;DR
This work investigates how open Large Language Models perform on Word Sense Disambiguation by redesigning the XL-WSD benchmark into two tasks: generate a word’s definition in context and select the correct sense from a predefined set. It systematically compares zero-shot and fine-tuned settings across five languages using BabelNet as the sense inventory and XL-WSD as the multilingual corpus, including translations of missing glosses. The key finding is that zero-shot LLMs generally lag behind state-of-the-art methods, but a medium-sized model that is fine-tuned on the benchmark achieves state-of-the-art performance across languages, notably English. The study provides valuable open resources (data, code, and fine-tuned models) and highlights future directions for broader language coverage, few-shot prompts, and expanded open-model evaluation in WSD.
Abstract
Word Sense Disambiguation (WSD) is a historical task in computational linguistics that has received much attention over the years. However, with the advent of Large Language Models (LLMs), interest in this task (in its classical definition) has decreased. In this study, we evaluate the performance of various LLMs on the WSD task. We extend a previous benchmark (XL-WSD) to re-design two subtasks suitable for LLM: 1) given a word in a sentence, the LLM must generate the correct definition; 2) given a word in a sentence and a set of predefined meanings, the LLM must select the correct one. The extended benchmark is built using the XL-WSD and BabelNet. The results indicate that LLMs perform well in zero-shot learning but cannot surpass current state-of-the-art methods. However, a fine-tuned model with a medium number of parameters outperforms all other models, including the state-of-the-art.
