Table of Contents
Fetching ...

Survey in Characterization of Semantic Change

Jader Martins Camboim de Sá, Marcos Da Silveira, Cédric Pruski

TL;DR

This survey addresses the gap in semantic-change research by focusing on characterization rather than identification, proposing a three-pole taxonomy of change: dimension, relation, and orientation. It reviews corpora and representation methods (frequency, topics, graphs, embeddings) and offers a formal, sense-based framework with a sense-objectivism assumption to unify the analysis. The authors provide critical synthesis of existing approaches, identify gaps (notably in relation and orientation characterization), and illustrate the framework with concrete examples using SEMCOR/MASC and WordNet resources. The work highlights the need for standardized datasets, metrics, and multilingual resources to advance robust characterization and improve NLP applications that depend on historical or evolving meanings.

Abstract

Live languages continuously evolve to integrate the cultural change of human societies. This evolution manifests through neologisms (new words) or \textbf{semantic changes} of words (new meaning to existing words). Understanding the meaning of words is vital for interpreting texts coming from different cultures (regionalism or slang), domains (e.g., technical terms), or periods. In computer science, these words are relevant to computational linguistics algorithms such as translation, information retrieval, question answering, etc. Semantic changes can potentially impact the quality of the outcomes of these algorithms. Therefore, it is important to understand and characterize these changes formally. The study of this impact is a recent problem that has attracted the attention of the computational linguistics community. Several approaches propose methods to detect semantic changes with good precision, but more effort is needed to characterize how the meaning of words changes and to reason about how to reduce the impact of semantic change. This survey provides an understandable overview of existing approaches to the \textit{characterization of semantic changes} and also formally defines three classes of characterizations: if the meaning of a word becomes more general or narrow (change in dimension) if the word is used in a more pejorative or positive/ameliorated sense (change in orientation), and if there is a trend to use the word in a, for instance, metaphoric or metonymic context (change in relation). We summarized the main aspects of the selected publications in a table and discussed the needs and trends in the research activities on semantic change characterization.

Survey in Characterization of Semantic Change

TL;DR

This survey addresses the gap in semantic-change research by focusing on characterization rather than identification, proposing a three-pole taxonomy of change: dimension, relation, and orientation. It reviews corpora and representation methods (frequency, topics, graphs, embeddings) and offers a formal, sense-based framework with a sense-objectivism assumption to unify the analysis. The authors provide critical synthesis of existing approaches, identify gaps (notably in relation and orientation characterization), and illustrate the framework with concrete examples using SEMCOR/MASC and WordNet resources. The work highlights the need for standardized datasets, metrics, and multilingual resources to advance robust characterization and improve NLP applications that depend on historical or evolving meanings.

Abstract

Live languages continuously evolve to integrate the cultural change of human societies. This evolution manifests through neologisms (new words) or \textbf{semantic changes} of words (new meaning to existing words). Understanding the meaning of words is vital for interpreting texts coming from different cultures (regionalism or slang), domains (e.g., technical terms), or periods. In computer science, these words are relevant to computational linguistics algorithms such as translation, information retrieval, question answering, etc. Semantic changes can potentially impact the quality of the outcomes of these algorithms. Therefore, it is important to understand and characterize these changes formally. The study of this impact is a recent problem that has attracted the attention of the computational linguistics community. Several approaches propose methods to detect semantic changes with good precision, but more effort is needed to characterize how the meaning of words changes and to reason about how to reduce the impact of semantic change. This survey provides an understandable overview of existing approaches to the \textit{characterization of semantic changes} and also formally defines three classes of characterizations: if the meaning of a word becomes more general or narrow (change in dimension) if the word is used in a more pejorative or positive/ameliorated sense (change in orientation), and if there is a trend to use the word in a, for instance, metaphoric or metonymic context (change in relation). We summarized the main aspects of the selected publications in a table and discussed the needs and trends in the research activities on semantic change characterization.
Paper Structure (37 sections, 22 equations, 13 figures, 6 tables)

This paper contains 37 sections, 22 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: Taxonomy for the poles of Lexical Semantic Change, based on the work of koch2016meaningBlank2003PolysemyHockJoseph2019Semantic.
  • Figure 2: Change in meaning and orientation for the word awful. In the left side, we reproduce the figure from Hamilton2016DiachronicWE that shows the evolution of the word 'awful' in the embedding space. In the right side, we present the hypothetical function ($\mathfrak{f}$) for this word over time.
  • Figure 3: Graph of selected works (green) and related articles (blue).
  • Figure 4: Figure adapted from Inoue2022InfiniteSA. The stacked bar plots represent the topics obtained over time for the words 'coach', 'record' and 'power' respectively. We can observe new senses emerging and becoming dominant.
  • Figure 5: Adaptation from Ehmller2020SenseTD. Ego-network, built from word co-occurrence graph, for 'mouse'. We observe that in 1830 it was used with the sense of 'weak' and 'rat,' where in 1960 the sense of 'computer device' emerged.
  • ...and 8 more figures