Definition generation for lexical semantic change detection
Mariia Fedorova, Andrey Kutuzov, Yves Scherrer
TL;DR
This paper investigates using contextualized definitions generated by multilingual LLMs as semantic representations for diachronic lexical semantic change detection (LSCD). By embedding these definitions and applying standard LSCD aggregators (APD, PRT, and their mean), the method achieves competitive performance across English, Norwegian, and Russian benchmarks, often surpassing token-embedding baselines. It also introduces an interpretable pathway—definitions-as-senses—with merging strategies to reduce noise and enable lexicographers to inspect which definitions drive shifts. The work demonstrates that generated definitions can provide both strong quantitative signals and human-readable explanations, marking a step toward explainable semantic change modeling in a multilingual setting, with code and models released for reproducibility.
Abstract
We use contextualized word definitions generated by large language models as semantic representations in the task of diachronic lexical semantic change detection (LSCD). In short, generated definitions are used as `senses', and the change score of a target word is retrieved by comparing their distributions in two time periods under comparison. On the material of five datasets and three languages, we show that generated definitions are indeed specific and general enough to convey a signal sufficient to rank sets of words by the degree of their semantic change over time. Our approach is on par with or outperforms prior non-supervised sense-based LSCD methods. At the same time, it preserves interpretability and allows to inspect the reasons behind a specific shift in terms of discrete definitions-as-senses. This is another step in the direction of explainable semantic change modeling.
