Table of Contents
Fetching ...

Latin Treebanks in Review: An Evaluation of Morphological Tagging Across Time

Marisa Hudspeth, Brendan O'Connor, Laure Thompson

TL;DR

This study addresses cross-time and cross-genre morphological tagging for Latin by consolidating five UD treebanks with LASLA through automated harmonization and a standard Latin grammar conversion. It introduces time- and genre-aware metadata, constructs cross-time data splits, and demonstrates that a LatinBERT-based tagger with separate feature heads achieves state-of-the-art performance while showing robust cross-domain behavior. Key contributions include a detailed annotation agreement analysis, a standardized tagset aligned with traditional Latin grammar, and insightful cross-time experiments revealing where data harmonization helps or hurts. The work lays groundwork for more reliable Latin NLP across historical varieties and informs dataset curation, model design, and evaluation strategies in digital humanities contexts.

Abstract

Existing Latin treebanks draw from Latin's long written tradition, spanning 17 centuries and a variety of cultures. Recent efforts have begun to harmonize these treebanks' annotations to better train and evaluate morphological taggers. However, the heterogeneity of these treebanks must be carefully considered to build effective and reliable data. In this work, we review existing Latin treebanks to identify the texts they draw from, identify their overlap, and document their coverage across time and genre. We additionally design automated conversions of their morphological feature annotations into the conventions of standard Latin grammar. From this, we build new time-period data splits that draw from the existing treebanks which we use to perform a broad cross-time analysis for POS and morphological feature tagging. We find that BERT-based taggers outperform existing taggers while also being more robust to cross-domain shifts.

Latin Treebanks in Review: An Evaluation of Morphological Tagging Across Time

TL;DR

This study addresses cross-time and cross-genre morphological tagging for Latin by consolidating five UD treebanks with LASLA through automated harmonization and a standard Latin grammar conversion. It introduces time- and genre-aware metadata, constructs cross-time data splits, and demonstrates that a LatinBERT-based tagger with separate feature heads achieves state-of-the-art performance while showing robust cross-domain behavior. Key contributions include a detailed annotation agreement analysis, a standardized tagset aligned with traditional Latin grammar, and insightful cross-time experiments revealing where data harmonization helps or hurts. The work lays groundwork for more reliable Latin NLP across historical varieties and informs dataset curation, model design, and evaluation strategies in digital humanities contexts.

Abstract

Existing Latin treebanks draw from Latin's long written tradition, spanning 17 centuries and a variety of cultures. Recent efforts have begun to harmonize these treebanks' annotations to better train and evaluate morphological taggers. However, the heterogeneity of these treebanks must be carefully considered to build effective and reliable data. In this work, we review existing Latin treebanks to identify the texts they draw from, identify their overlap, and document their coverage across time and genre. We additionally design automated conversions of their morphological feature annotations into the conventions of standard Latin grammar. From this, we build new time-period data splits that draw from the existing treebanks which we use to perform a broad cross-time analysis for POS and morphological feature tagging. We find that BERT-based taggers outperform existing taggers while also being more robust to cross-domain shifts.
Paper Structure (34 sections, 4 figures, 13 tables)

This paper contains 34 sections, 4 figures, 13 tables.

Figures (4)

  • Figure 1: From our curated metadata (§\ref{['sec:data']}), the number of sentences per century (3rd BCE---14th CE) across the 5 UD treebanks and LASLA, shown with three broad time periods.
  • Figure 2: Number of sentences in the UD treebanks per century, colored by genre.
  • Figure 3: Example of how a token's set of morphological features changes after standardization, from Cicero’s Letters to Atticus Book 3 Letter 9.
  • Figure 4: Example of an error in the model's prediction due to acontextual ambiguity, from Cicero’s Letters to Atticus Book 3 Letter 9.