Table of Contents
Fetching ...

Dynamic embedded topic models and change-point detection for exploring literary-historical hypotheses

Hale Sirin, Tom Lippincott

TL;DR

The paper presents a framework combining a dynamic embedded topic model with change-point detection to study diachronic semantic modality in Classical and early Christian Latin, using bimodality as a core signal. It builds on a Perseus-derived Latin corpus, lemmatization, and 75-year windows to track topic distributions and semantic shifts, introducing a bimodality metric and author-novelty measure via Jensen-Shannon divergence. Case studies of words like manus, figura, and effigies reveal both rapid and gradual diffusion patterns and a broad trend of decreasing modality-shift effects around 200 CE, aligned with a shift toward standardization in early Christian rhetoric. The work provides a practical unsupervised approach to surface linguistically and historically meaningful patterns, with potential extensions to broader or noisier corpora and a user-friendly interface for humanists.

Abstract

We present a novel combination of dynamic embedded topic models and change-point detection to explore diachronic change of lexical semantic modality in classical and early Christian Latin. We demonstrate several methods for finding and characterizing patterns in the output, and relating them to traditional scholarship in Comparative Literature and Classics. This simple approach to unsupervised models of semantic change can be applied to any suitable corpus, and we conclude with future directions and refinements aiming to allow noisier, less-curated materials to meet that threshold.

Dynamic embedded topic models and change-point detection for exploring literary-historical hypotheses

TL;DR

The paper presents a framework combining a dynamic embedded topic model with change-point detection to study diachronic semantic modality in Classical and early Christian Latin, using bimodality as a core signal. It builds on a Perseus-derived Latin corpus, lemmatization, and 75-year windows to track topic distributions and semantic shifts, introducing a bimodality metric and author-novelty measure via Jensen-Shannon divergence. Case studies of words like manus, figura, and effigies reveal both rapid and gradual diffusion patterns and a broad trend of decreasing modality-shift effects around 200 CE, aligned with a shift toward standardization in early Christian rhetoric. The work provides a practical unsupervised approach to surface linguistically and historically meaningful patterns, with potential extensions to broader or noisier corpora and a user-friendly interface for humanists.

Abstract

We present a novel combination of dynamic embedded topic models and change-point detection to explore diachronic change of lexical semantic modality in classical and early Christian Latin. We demonstrate several methods for finding and characterizing patterns in the output, and relating them to traditional scholarship in Comparative Literature and Classics. This simple approach to unsupervised models of semantic change can be applied to any suitable corpus, and we conclude with future directions and refinements aiming to allow noisier, less-curated materials to meet that threshold.
Paper Structure (8 sections, 1 equation, 6 figures, 3 tables)

This paper contains 8 sections, 1 equation, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Top words of two topics, at four windows evenly spread across our temporal range, illustrating the semantic shift of manus (hand).
  • Figure 2: Sum of deltas (bimodal shift) for words with their change-point in the given window.
  • Figure 3: Counts of Christian and pagan authors binned into 10 ranges according to their novelty.
  • Figure 4: All author novelties in descending order, indicating the position of several authors singled out by Auerbach. Darker colors correspond to earlier windows.
  • Figure 5: Words sorted by degree of bimodal shift (their change-point delta).
  • ...and 1 more figures