Latin Treebanks in Review: An Evaluation of Morphological Tagging Across Time
Marisa Hudspeth, Brendan O'Connor, Laure Thompson
TL;DR
This study addresses cross-time and cross-genre morphological tagging for Latin by consolidating five UD treebanks with LASLA through automated harmonization and a standard Latin grammar conversion. It introduces time- and genre-aware metadata, constructs cross-time data splits, and demonstrates that a LatinBERT-based tagger with separate feature heads achieves state-of-the-art performance while showing robust cross-domain behavior. Key contributions include a detailed annotation agreement analysis, a standardized tagset aligned with traditional Latin grammar, and insightful cross-time experiments revealing where data harmonization helps or hurts. The work lays groundwork for more reliable Latin NLP across historical varieties and informs dataset curation, model design, and evaluation strategies in digital humanities contexts.
Abstract
Existing Latin treebanks draw from Latin's long written tradition, spanning 17 centuries and a variety of cultures. Recent efforts have begun to harmonize these treebanks' annotations to better train and evaluate morphological taggers. However, the heterogeneity of these treebanks must be carefully considered to build effective and reliable data. In this work, we review existing Latin treebanks to identify the texts they draw from, identify their overlap, and document their coverage across time and genre. We additionally design automated conversions of their morphological feature annotations into the conventions of standard Latin grammar. From this, we build new time-period data splits that draw from the existing treebanks which we use to perform a broad cross-time analysis for POS and morphological feature tagging. We find that BERT-based taggers outperform existing taggers while also being more robust to cross-domain shifts.
