Table of Contents
Fetching ...

PrOnto: Language Model Evaluations for 859 Languages

Luke Gessler

TL;DR

PrOnto tackles the scarcity of multilingual evaluation data by projecting OntoNotes' New Testament annotations into 859 languages via verse alignment, creating a scalable evaluation resource for pretrained language models. The approach centers on five annotation-projection tasks implemented as sequence classification, evaluated across a diverse set of languages and pretrained models using a standardized training setup. The results show that the projected tasks are meaningful proxies for model quality across languages with varying typological distance from English, and that the resource remains useful for high-, medium-, and low-resource settings. The work further provides a practical pipeline and encourages community contributions to extend the dataset and potentially derive typological distance insights from projection errors.

Abstract

Evaluation datasets are critical resources for measuring the quality of pretrained language models. However, due to the high cost of dataset annotation, these resources are scarce for most languages other than English, making it difficult to assess the quality of language models. In this work, we present a new method for evaluation dataset construction which enables any language with a New Testament translation to receive a suite of evaluation datasets suitable for pretrained language model evaluation. The method critically involves aligning verses with those in the New Testament portion of English OntoNotes, and then projecting annotations from English to the target language, with no manual annotation required. We apply this method to 1051 New Testament translations in 859 and make them publicly available. Additionally, we conduct experiments which demonstrate the efficacy of our method for creating evaluation tasks which can assess language model quality.

PrOnto: Language Model Evaluations for 859 Languages

TL;DR

PrOnto tackles the scarcity of multilingual evaluation data by projecting OntoNotes' New Testament annotations into 859 languages via verse alignment, creating a scalable evaluation resource for pretrained language models. The approach centers on five annotation-projection tasks implemented as sequence classification, evaluated across a diverse set of languages and pretrained models using a standardized training setup. The results show that the projected tasks are meaningful proxies for model quality across languages with varying typological distance from English, and that the resource remains useful for high-, medium-, and low-resource settings. The work further provides a practical pipeline and encourages community contributions to extend the dataset and potentially derive typological distance insights from projection errors.

Abstract

Evaluation datasets are critical resources for measuring the quality of pretrained language models. However, due to the high cost of dataset annotation, these resources are scarce for most languages other than English, making it difficult to assess the quality of language models. In this work, we present a new method for evaluation dataset construction which enables any language with a New Testament translation to receive a suite of evaluation datasets suitable for pretrained language model evaluation. The method critically involves aligning verses with those in the New Testament portion of English OntoNotes, and then projecting annotations from English to the target language, with no manual annotation required. We apply this method to 1051 New Testament translations in 859 and make them publicly available. Additionally, we conduct experiments which demonstrate the efficacy of our method for creating evaluation tasks which can assess language model quality.
Paper Structure (33 sections, 2 figures, 5 tables)

This paper contains 33 sections, 2 figures, 5 tables.

Figures (2)

  • Figure 1: A sample verse, John 11:35, taken from OntoNotes. Note the annotations for tokenization, part-of-speech, constituency syntax, coreference, and argument structure. This file is in "OntoNotes Normal Form" (ONF), a human-readable format which OntoNotes provides its annotations in.
  • Figure 2: Matthew 9:5-6, as translated by the ERV (above) and the NRSVUE (below). In the ERV translation, verses 5 and 6 are fused, which means that no boundary between the two is indicated, and that their contents have been altered in linear ordering.