(Not) Understanding Latin Poetic Style with Deep Learning

Ben Nagy

(Not) Understanding Latin Poetic Style with Deep Learning

Ben Nagy

TL;DR

This work investigates whether modern neural networks can reveal how authors differ in their Latin poetic style, focusing on sonic and metrical features in hexameter verse. It compares CNNs and LSTMs on syllable-level encodings, showing CNNs with trainable embeddings achieve high classification accuracy (approximately $97$–$98\%$) and train faster than LSTMs, while basic metrical or phonetic-only inputs remain informative but slightly weaker. Despite strong performance, the study finds that interpretability lags: attention visualizations do not yield reliable insights into what stylistic cues the models rely on, especially at the dense layer, and deep layer visualizations remain inconsistent across architectures. The paper argues for exploring improved embedding spaces, similarity measures for authorship, and alternative interpretability approaches, acknowledging that quantitative success does not yet translate into a human-understandable account of poetic style. The work provides a candid, data-driven assessment of current limits and suggests concrete avenues for advancing computational poetics.

Abstract

This article summarizes some mostly unsuccessful attempts to understand authorial style by examining the attention of various neural networks (LSTMs and CNNs) trained on a corpus of classical Latin verse that has been encoded to include sonic and metrical features. Carefully configured neural networks are shown to be extremely strong authorship classifiers, so it is hoped that they might therefore teach `traditional' readers something about how the authors differ in style. Sadly their reasoning is, so far, inscrutable. While the overall goal has not yet been reached, this work reports some useful findings in terms of effective ways to encode and embed verse, the relative strengths and weaknesses of the neural network families, and useful (and not so useful) techniques for designing and inspecting NN models in this domain. This article suggests that, for poetry, CNNs are better choices than LSTMs -- they train more quickly, have equivalent accuracy, and (potentially) offer better interpretability. Based on a great deal of experimentation, it also suggests that simple, trainable embeddings are more effective than domain-specific schemes, and stresses the importance of techniques to reduce overfitting, like dropout and batch normalization.

(Not) Understanding Latin Poetic Style with Deep Learning

TL;DR

–

) and train faster than LSTMs, while basic metrical or phonetic-only inputs remain informative but slightly weaker. Despite strong performance, the study finds that interpretability lags: attention visualizations do not yield reliable insights into what stylistic cues the models rely on, especially at the dense layer, and deep layer visualizations remain inconsistent across architectures. The paper argues for exploring improved embedding spaces, similarity measures for authorship, and alternative interpretability approaches, acknowledging that quantitative success does not yet translate into a human-understandable account of poetic style. The work provides a candid, data-driven assessment of current limits and suggests concrete avenues for advancing computational poetics.

Abstract

Paper Structure (17 sections, 9 figures, 1 table)

This paper contains 17 sections, 9 figures, 1 table.

Introduction
Poetic Corpus
Methods
Care and Feeding of Neural Networks
Encoding and Embedding
Controlling Overfitting
Results
Classification Results
Visualising Attention
Problems at the dense layer
Problems everywhere else as well
Future Work
Conclusions
Availability of Data and Code
Model Configurations
...and 2 more sections

Figures (9)

Figure 1: An example from the texttf-keras-vis documentation of pythonScoreCam working as intended, on images.
Figure 2: The stages of transformation from raw verse, to phonetic orthography, to syllable tokens with metrical metadata.
Figure 3: Do the embeddings carry meaningful poetic information? The embedding space is projected to 2D using UMAP, and then density estimates are plotted for syllables with certain poetic features to determine whether they cluster.
Figure 4: LSTM attention from the pythonVanillaSaliency visualizer, on 64-line chunks. The vanishing gradient problem is clearly visible.
Figure 5: LSTM attention in detail, showing Vergil Aeneid 7.200--64. Some work-specific named entities appear to have high salience (Mycenis, Dardana, Troius…).
...and 4 more figures

(Not) Understanding Latin Poetic Style with Deep Learning

TL;DR

Abstract

(Not) Understanding Latin Poetic Style with Deep Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (9)