ÚFAL LatinPipe at EvaLatin 2024: Morphosyntactic Analysis of Latin
Milan Straka, Jana Straková, Federica Gamba
TL;DR
This paper presents ÚFAL LatinPipe, a transformer-based, graph-based dependency parser that jointly models Latin dependency parsing and morphosyntactic tagging. It achieves state-of-the-art UPOS/UFeats performance by blending multiple fine-tuned LM embeddings, adding BiLSTM contextualization, employing gold UPOS as input, and ensembling seven models across seven Latin treebanks with annotation harmonization. Key innovations include frozen pretraining, multi-treebank sampling, PROIEL annotation harmonization, and punctuation handling in train/test shifts, yielding strong results on UD 2.13 and EvaLatin 2024 benchmarks. The work demonstrates the practicality of LM concatenations and ensemble strategies for Latin morphosyntax and provides a foundation for UDPipe 3, with code released for reproducibility.
Abstract
We present LatinPipe, the winning submission to the EvaLatin 2024 Dependency Parsing shared task. Our system consists of a fine-tuned concatenation of base and large pre-trained LMs, with a dot-product attention head for parsing and softmax classification heads for morphology to jointly learn both dependency parsing and morphological analysis. It is trained by sampling from seven publicly available Latin corpora, utilizing additional harmonization of annotations to achieve a more unified annotation style. Before fine-tuning, we train the system for a few initial epochs with frozen weights. We also add additional local relative contextualization by stacking the BiLSTM layers on top of the Transformer(s). Finally, we ensemble output probability distributions from seven randomly instantiated networks for the final submission. The code is available at https://github.com/ufal/evalatin2024-latinpipe.
