Table of Contents
Fetching ...

Comparative Analysis of Static and Contextual Embeddings for Analyzing Semantic Changes in Medieval Latin Charters

Yifan Liu, Gelila Tilahun, Xinxiang Gao, Qianfeng Wen, Michael Gervers

TL;DR

This paper presents the first computational analysis of semantic change pre- and post-Norman Conquest and the first systematic comparison of static and contextual embeddings in a scarce historical data set.

Abstract

The Norman Conquest of 1066 C.E. brought profound transformations to England's administrative, societal, and linguistic practices. The DEEDS (Documents of Early England Data Set) database offers a unique opportunity to explore these changes by examining shifts in word meanings within a vast collection of Medieval Latin charters. While computational linguistics typically relies on vector representations of words like static and contextual embeddings to analyze semantic changes, existing embeddings for scarce and historical Medieval Latin are limited and may not be well-suited for this task. This paper presents the first computational analysis of semantic change pre- and post-Norman Conquest and the first systematic comparison of static and contextual embeddings in a scarce historical data set. Our findings confirm that, consistent with existing studies, contextual embeddings outperform static word embeddings in capturing semantic change within a scarce historical corpus.

Comparative Analysis of Static and Contextual Embeddings for Analyzing Semantic Changes in Medieval Latin Charters

TL;DR

This paper presents the first computational analysis of semantic change pre- and post-Norman Conquest and the first systematic comparison of static and contextual embeddings in a scarce historical data set.

Abstract

The Norman Conquest of 1066 C.E. brought profound transformations to England's administrative, societal, and linguistic practices. The DEEDS (Documents of Early England Data Set) database offers a unique opportunity to explore these changes by examining shifts in word meanings within a vast collection of Medieval Latin charters. While computational linguistics typically relies on vector representations of words like static and contextual embeddings to analyze semantic changes, existing embeddings for scarce and historical Medieval Latin are limited and may not be well-suited for this task. This paper presents the first computational analysis of semantic change pre- and post-Norman Conquest and the first systematic comparison of static and contextual embeddings in a scarce historical data set. Our findings confirm that, consistent with existing studies, contextual embeddings outperform static word embeddings in capturing semantic change within a scarce historical corpus.

Paper Structure

This paper contains 19 sections, 1 equation, 2 figures, 4 tables, 1 algorithm.

Figures (2)

  • Figure 1: Distribution of cosine similarity for changed and unchanged words across different embedding models -- AN period (top) and NP period (bottom). The dashed lines represent the mean cosine similarity for changed and unchanged words across the two periods and for each model. The shaded areas represent the 95% confidence intervals.
  • Figure 2: Heatmaps showing the evaluation metrics varying across different hyperparameter settings, with $\delta_{\mu}$ (top) and $\rho$ (bottom).