A Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change

Francesco Periti; Nina Tahmasebi

A Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change

Francesco Periti, Nina Tahmasebi

TL;DR

Evaluation of state-of-the-art models and approaches for GCD shows that APD outperforms other approaches for GCD; XL-LEXEME outperforms other contextualized models for WiC, WSI, and GCD; and there is a clear need for improving the modeling of word meanings.

Abstract

Contextualized embeddings are the preferred tool for modeling Lexical Semantic Change (LSC). Current evaluations typically focus on a specific task known as Graded Change Detection (GCD). However, performance comparison across work are often misleading due to their reliance on diverse settings. In this paper, we evaluate state-of-the-art models and approaches for GCD under equal conditions. We further break the LSC problem into Word-in-Context (WiC) and Word Sense Induction (WSI) tasks, and compare models across these different levels. Our evaluation is performed across different languages on eight available benchmarks for LSC, and shows that (i) APD outperforms other approaches for GCD; (ii) XL-LEXEME outperforms other contextualized models for WiC, WSI, and GCD, while being comparable to GPT-4; (iii) there is a clear need for improving the modeling of word meanings, as well as focus on how, when, and why these meanings change, rather than solely focusing on the extent of semantic change.

A Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change

TL;DR

Abstract

Paper Structure (42 sections, 4 equations, 3 figures, 8 tables)

This paper contains 42 sections, 4 equations, 3 figures, 8 tables.

Introduction
Original contribution of our work
Background and related work
Approaches to Graded Change Detection
Comparison of approaches
Current modeling of LSC
A systematic comparison
Evaluation setup
Standard Graded Change Detection
Computational annotators
Comparing approaches for GCD
Form-based approaches
PRT
APD
Sense-based approaches
...and 27 more sections

Figures (3)

Figure 1: DWUG for the German word Eintagsfliege. Nodes represent word usages. Edges represent the relatedness between usages. Colors indicate clusters (senses) inferred from the full graph laicher2021explaining.
Figure 2: Score distribution for GCD obtained by using all possible layer combinations of length 2 (e.g., Layer 1 and 2), length 3 (e.g., Layer 10, 11, 12), and length 4 (e.g., Layer 1, 10, 11, 12) for BERT, mBERT, and XLM-R. The y-axis represents the Spearman correlation. We highlight the performance for GCD obtained using Layer 8, Layer 12, and the sum of the last 4 layers (i.e., $\bigoplus$ 9-12).
Figure 3: Score distribution for GCD obtained by using all possible layer combinations of length 2 (e.g., Layer 1 and 2), length 3 (e.g., Layer 10, 11, 12), and length 4 (e.g., Layer 1, 10, 11, 12) for BERT, mBERT, and XLM-R. The y-axis represents the Spearman correlation. We highlight the performance for GCD obtained using Layer 8, Layer 12, and the sum of the last 4 layers (i.e., $\bigoplus$ 9-12).

A Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change

TL;DR

Abstract

A Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change

Authors

TL;DR

Abstract

Table of Contents

Figures (3)