DeepScribe: Localization and Classification of Elamite Cuneiform Signs Via Deep Learning
Edward C. Williams, Grace Su, Sandra R. Schloen, Miller C. Prosser, Susanne Paulus, Sanjay Krishnan
TL;DR
DeepScribe introduces a modular deep-learning pipeline to localize Elamite cuneiform signs on Persepolis tablet images and propose sign identities, leveraging the richly annotated Persepolis Fortification Archive (PFA). A RetinaNet-based sign detector and a ResNet-based sign classifier achieve strong per-subtask performance (e.g., AP@50 around the high 70s and top-5 accuracy near 0.89) and enable an end-to-end transliteration framework using a Sequential RANSAC line detector. While the sign-level components perform well on ground-truth regions, end-to-end transcription remains challenging without linguistic context, highlighting the need for integrating language models or context-aware priors. The work demonstrates practical utility for scholars, provides a data/code release, and outlines a roadmap for linguistically-aware transliteration and cross-period applicability of the approach.
Abstract
Twenty-five hundred years ago, the paperwork of the Achaemenid Empire was recorded on clay tablets. In 1933, archaeologists from the University of Chicago's Oriental Institute (OI) found tens of thousands of these tablets and fragments during the excavation of Persepolis. Many of these tablets have been painstakingly photographed and annotated by expert cuneiformists, and now provide a rich dataset consisting of over 5,000 annotated tablet images and 100,000 cuneiform sign bounding boxes. We leverage this dataset to develop DeepScribe, a modular computer vision pipeline capable of localizing cuneiform signs and providing suggestions for the identity of each sign. We investigate the difficulty of learning subtasks relevant to cuneiform tablet transcription on ground-truth data, finding that a RetinaNet object detector can achieve a localization mAP of 0.78 and a ResNet classifier can achieve a top-5 sign classification accuracy of 0.89. The end-to-end pipeline achieves a top-5 classification accuracy of 0.80. As part of the classification module, DeepScribe groups cuneiform signs into morphological clusters. We consider how this automatic clustering approach differs from the organization of standard, printed sign lists and what we may learn from it. These components, trained individually, are sufficient to produce a system that can analyze photos of cuneiform tablets from the Achaemenid period and provide useful transliteration suggestions to researchers. We evaluate the model's end-to-end performance on locating and classifying signs, providing a roadmap to a linguistically-aware transliteration system, then consider the model's potential utility when applied to other periods of cuneiform writing.
