Neural Style Transfer for Synthesising a Dataset of Ancient Egyptian Hieroglyphs
Lewis Matheson Creed
TL;DR
This work demonstrates that Neural Style Transfer can be used as an effective data augmentation method to generate a large NST-based dataset from a digital hieroglyph typeface, addressing data scarcity in ancient Egyptian ML tasks. By pairing a real-world G17 style set with the J-Sesh content typeface, the NST pipeline produces 175 variations across 34 classes, enabling GlyphNet to achieve near-perfect intra-dataset accuracy ($~$0.99) and strong transferability to real hieroglyph images (G17, $~0.74$ accuracy). The NST-trained models outperform non-augmented font baselines and perform comparably to state-of-the-art real-data models on transfer tests, highlighting NST as a practical data augmentation strategy for low-resource scripts. The study also discusses limitations (burn-in artifacts, hyper-parameter sensitivity) and outlines future work to cover all Gardiner signs and broader styles, potentially enabling open Benchmark datasets for ancient Egyptian hieroglyphs.”
Abstract
The limited availability of training data for low-resource languages makes applying machine learning techniques challenging. Ancient Egyptian is one such language with few resources. However, innovative applications of data augmentation methods, such as Neural Style Transfer, could overcome these barriers. This paper presents a novel method for generating datasets of ancient Egyptian hieroglyphs by applying NST to a digital typeface. Experimental results found that image classification models trained on NST-generated examples and photographs demonstrate equal performance and transferability to real unseen images of hieroglyphs.
