Table of Contents
Fetching ...

From cart to truck: meaning shift through words in English in the last two centuries

Esteban Rodríguez Betancourt, Edgar Casasola Murillo

TL;DR

This study addresses how the same concepts are expressed by different words across two centuries using diachronic word embeddings. They train word2vec skip-gram embeddings per decade from 1800 to 2000 on Google N-Grams, align with Orthogonal Procrustes to the 1990s, and analyze nearest neighbors to selected concepts. Findings reveal domain-specific shifts across energy, transport, entertainment, and computing that reflect technological and social change, such as coal/steam replacing petroleum/diesel and theatres giving way to cinema/television. The work demonstrates the utility of an onomasiological lens for historical linguistics and provides a framework for interpreting dynamic word usage, while noting the need for expert interpretation due to biases and data limitations.

Abstract

This onomasiological study uses diachronic word embeddings to explore how different words represented the same concepts over time, using historical word data from 1800 to 2000. We identify shifts in energy, transport, entertainment, and computing domains, revealing connections between language and societal changes. Our approach consisted in using diachronic word embeddings trained using word2vec with skipgram and aligning them using orthogonal Procrustes. We discuss possible difficulties linked to the relationships the method identifies. Moreover, we look at the ethical aspects of interpreting results, highlighting the need for expert insights to understand the method's significance.

From cart to truck: meaning shift through words in English in the last two centuries

TL;DR

This study addresses how the same concepts are expressed by different words across two centuries using diachronic word embeddings. They train word2vec skip-gram embeddings per decade from 1800 to 2000 on Google N-Grams, align with Orthogonal Procrustes to the 1990s, and analyze nearest neighbors to selected concepts. Findings reveal domain-specific shifts across energy, transport, entertainment, and computing that reflect technological and social change, such as coal/steam replacing petroleum/diesel and theatres giving way to cinema/television. The work demonstrates the utility of an onomasiological lens for historical linguistics and provides a framework for interpreting dynamic word usage, while noting the need for expert interpretation due to biases and data limitations.

Abstract

This onomasiological study uses diachronic word embeddings to explore how different words represented the same concepts over time, using historical word data from 1800 to 2000. We identify shifts in energy, transport, entertainment, and computing domains, revealing connections between language and societal changes. Our approach consisted in using diachronic word embeddings trained using word2vec with skipgram and aligning them using orthogonal Procrustes. We discuss possible difficulties linked to the relationships the method identifies. Moreover, we look at the ethical aspects of interpreting results, highlighting the need for expert insights to understand the method's significance.
Paper Structure (10 sections, 1 figure, 4 tables)

This paper contains 10 sections, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Number of words per decade after cleaning zeroed embeddings.