(Not) Understanding Latin Poetic Style with Deep Learning
Ben Nagy
TL;DR
This work investigates whether modern neural networks can reveal how authors differ in their Latin poetic style, focusing on sonic and metrical features in hexameter verse. It compares CNNs and LSTMs on syllable-level encodings, showing CNNs with trainable embeddings achieve high classification accuracy (approximately $97$–$98\%$) and train faster than LSTMs, while basic metrical or phonetic-only inputs remain informative but slightly weaker. Despite strong performance, the study finds that interpretability lags: attention visualizations do not yield reliable insights into what stylistic cues the models rely on, especially at the dense layer, and deep layer visualizations remain inconsistent across architectures. The paper argues for exploring improved embedding spaces, similarity measures for authorship, and alternative interpretability approaches, acknowledging that quantitative success does not yet translate into a human-understandable account of poetic style. The work provides a candid, data-driven assessment of current limits and suggests concrete avenues for advancing computational poetics.
Abstract
This article summarizes some mostly unsuccessful attempts to understand authorial style by examining the attention of various neural networks (LSTMs and CNNs) trained on a corpus of classical Latin verse that has been encoded to include sonic and metrical features. Carefully configured neural networks are shown to be extremely strong authorship classifiers, so it is hoped that they might therefore teach `traditional' readers something about how the authors differ in style. Sadly their reasoning is, so far, inscrutable. While the overall goal has not yet been reached, this work reports some useful findings in terms of effective ways to encode and embed verse, the relative strengths and weaknesses of the neural network families, and useful (and not so useful) techniques for designing and inspecting NN models in this domain. This article suggests that, for poetry, CNNs are better choices than LSTMs -- they train more quickly, have equivalent accuracy, and (potentially) offer better interpretability. Based on a great deal of experimentation, it also suggests that simple, trainable embeddings are more effective than domain-specific schemes, and stresses the importance of techniques to reduce overfitting, like dropout and batch normalization.
