Dynamic embedded topic models and change-point detection for exploring literary-historical hypotheses
Hale Sirin, Tom Lippincott
TL;DR
The paper presents a framework combining a dynamic embedded topic model with change-point detection to study diachronic semantic modality in Classical and early Christian Latin, using bimodality as a core signal. It builds on a Perseus-derived Latin corpus, lemmatization, and 75-year windows to track topic distributions and semantic shifts, introducing a bimodality metric and author-novelty measure via Jensen-Shannon divergence. Case studies of words like manus, figura, and effigies reveal both rapid and gradual diffusion patterns and a broad trend of decreasing modality-shift effects around 200 CE, aligned with a shift toward standardization in early Christian rhetoric. The work provides a practical unsupervised approach to surface linguistically and historically meaningful patterns, with potential extensions to broader or noisier corpora and a user-friendly interface for humanists.
Abstract
We present a novel combination of dynamic embedded topic models and change-point detection to explore diachronic change of lexical semantic modality in classical and early Christian Latin. We demonstrate several methods for finding and characterizing patterns in the output, and relating them to traditional scholarship in Comparative Literature and Classics. This simple approach to unsupervised models of semantic change can be applied to any suitable corpus, and we conclude with future directions and refinements aiming to allow noisier, less-curated materials to meet that threshold.
