(Chat)GPT v BERT: Dawn of Justice for Semantic Change Detection
Francesco Periti, Haim Dubossarsky, Nina Tahmasebi
TL;DR
This work addresses semantic change detection by evaluating off-the-shelf (Chat)GPT-3.5 against BERT on two diachronic WiC tasks: TempoWiC (short-term) and a newly introduced HistoWiC (long-term). The authors propose a controlled experimental framework with automatic prompts, varying in-context learning strategies, and a comparison across GPT API and ChatGPT Web, including a direct BERT baseline via layer-wise cosine-thresholding. Results show that GPT-3.5 generally underperforms BERT, particularly for short-term changes, though it shows relatively stronger performance on long-term historical change; API-based evaluation is more reliable than the web interface. The study highlights limitations of off-the-shelf ChatGPT for diachronic semantics and suggests that modern BERT-style embeddings remain robust baselines, while pointing to GPT-4 as a potential future improvement for lexical semantic change tasks.
Abstract
In the universe of Natural Language Processing, Transformer-based language models like BERT and (Chat)GPT have emerged as lexical superheroes with great power to solve open research problems. In this paper, we specifically focus on the temporal problem of semantic change, and evaluate their ability to solve two diachronic extensions of the Word-in-Context (WiC) task: TempoWiC and HistoWiC. In particular, we investigate the potential of a novel, off-the-shelf technology like ChatGPT (and GPT) 3.5 compared to BERT, which represents a family of models that currently stand as the state-of-the-art for modeling semantic change. Our experiments represent the first attempt to assess the use of (Chat)GPT for studying semantic change. Our results indicate that ChatGPT performs significantly worse than the foundational GPT version. Furthermore, our results demonstrate that (Chat)GPT achieves slightly lower performance than BERT in detecting long-term changes but performs significantly worse in detecting short-term changes.
