Lacuna Language Learning: Leveraging RNNs for Ranked Text Completion in Digitized Coptic Manuscripts

Lauren Levine; Cindy Tung Li; Lydia Bremer-McCollum; Nicholas Wagner; Amir Zeldes

Lacuna Language Learning: Leveraging RNNs for Ranked Text Completion in Digitized Coptic Manuscripts

Lauren Levine, Cindy Tung Li, Lydia Bremer-McCollum, Nicholas Wagner, Amir Zeldes

TL;DR

The paper tackles lacuna reconstruction in Coptic manuscripts by deploying a bidirectional character-level RNN trained with masked language modeling to predict missing characters and to rank candidate reconstructions. It achieves up to $72\%$ accuracy for single-character gaps and about $37\%$ for longer lacunae, highlighting that the model is not definitive but can provide useful probabilistic rankings to guide scholarly judgment. Through two case studies, the authors demonstrate how model-derived predictions and rankings can complement traditional textual criticism. The work shows the potential of neural methods to augment manuscript restoration workflows, especially as a quantitative supplement to expert analysis, and outlines clear avenues for future enhancements and broader application.

Abstract

Ancient manuscripts are frequently damaged, containing gaps in the text known as lacunae. In this paper, we present a bidirectional RNN model for character prediction of Coptic characters in manuscript lacunae. Our best model performs with 72% accuracy on single character reconstruction, but falls to 37% when reconstructing lacunae of various lengths. While not suitable for definitive manuscript reconstruction, we argue that our RNN model can help scholars rank the likelihood of textual reconstructions. As evidence, we use our RNN model to rank reconstructions in two early Coptic manuscripts. Our investigation shows that neural models can augment traditional methods of textual restoration, providing scholars with an additional tool to assess lacunae in Coptic manuscripts.

Lacuna Language Learning: Leveraging RNNs for Ranked Text Completion in Digitized Coptic Manuscripts

TL;DR

accuracy for single-character gaps and about

for longer lacunae, highlighting that the model is not definitive but can provide useful probabilistic rankings to guide scholarly judgment. Through two case studies, the authors demonstrate how model-derived predictions and rankings can complement traditional textual criticism. The work shows the potential of neural methods to augment manuscript restoration workflows, especially as a quantitative supplement to expert analysis, and outlines clear avenues for future enhancements and broader application.

Abstract

Paper Structure (15 sections, 3 figures, 2 tables)

This paper contains 15 sections, 3 figures, 2 tables.

Introduction
Background and Related Work
Coptic
Manuscript Reconstruction
Masked Language Models
Data
Model Architecture
Evaluation
Baselines
RNN Evaluation
Relative Ranking
Case Studies
Isaiah 37:24
The Nag Hammadi Library -- Gospel of Philip
Conclusion

Figures (3)

Figure 1: Model architecture and preprocessing
Figure 2: Accuracy of the various model configurations and tri-gram baseline relative to lacuna length in characters
Figure 3: P.Duk. inv. 282 fr. B verso

Lacuna Language Learning: Leveraging RNNs for Ranked Text Completion in Digitized Coptic Manuscripts

TL;DR

Abstract

Lacuna Language Learning: Leveraging RNNs for Ranked Text Completion in Digitized Coptic Manuscripts

Authors

TL;DR

Abstract

Table of Contents

Figures (3)