PyPotteryLens: An Open-Source Deep Learning Framework for Automated Digitisation of Archaeological Pottery Documentation
Lorenzo Cardarelli
TL;DR
The paper tackles the challenge of unlocking legacy pottery data by introducing PyPotteryLens, an open-source DL framework that automates the digitisation of pottery drawings using YOLO-based instance segmentation and a multi-head EfficientNetV2 classifier, all accessible via a Gradio UI and equipped with a self-annotation loop. It demonstrates high performance across diverse contexts (e.g., $mAP50-95 \approx 0.97$ for segmentation and $\text{Precision/Recall} \approx 0.97$–$0.99$) and substantial workflow time savings (up to ~20×) relative to manual digitisation, while preserving expert oversight. The modular, extensible architecture supports future enhancements such as style-transfer for generalisation, automated metadata extraction, and expansion to other archaeological materials, underscoring a significant advance in digital heritage preservation and reproducible computational archaeology. By processing thousands of pottery instances and enabling unsupervised analysis through learned representations, PyPotteryLens offers a scalable, open-path to data-rich archaeological interpretation and methodological integration.
Abstract
Archaeological pottery documentation and study represents a crucial but time-consuming aspect of archaeology. While recent years have seen advances in digital documentation methods, vast amounts of legacy data remain locked in traditional publications. This paper introduces PyPotteryLens, an open-source framework that leverages deep learning to automate the digitisation and processing of archaeological pottery drawings from published sources. The system combines state-of-the-art computer vision models (YOLO for instance segmentation and EfficientNetV2 for classification) with an intuitive user interface, making advanced digital methods accessible to archaeologists regardless of technical expertise. The framework achieves over 97\% precision and recall in pottery detection and classification tasks, while reducing processing time by up to 5x to 20x compared to manual methods. Testing across diverse archaeological contexts demonstrates robust generalisation capabilities. Also, the system's modular architecture facilitates extension to other archaeological materials, while its standardised output format ensures long-term preservation and reusability of digitised data as well as solid basis for training machine learning algorithms. The software, documentation, and examples are available on GitHub (https://github.com/lrncrd/PyPottery/tree/PyPotteryLens).
