Table of Contents
Fetching ...

Just Use XML: Revisiting Joint Translation and Label Projection

Thennal D K, Chris Biemann, Hans Ole Hatzel

Abstract

Label projection is an effective technique for cross-lingual transfer, extending span-annotated datasets from a high-resource language to low-resource ones. Most approaches perform label projection as a separate step after machine translation, and prior work that combines the two reports degraded translation quality. We re-evaluate this claim with LabelPigeon, a novel framework that jointly performs translation and label projection via XML tags. We design a direct evaluation scheme for label projection, and find that LabelPigeon outperforms baselines and actively improves translation quality in 11 languages. We further assess translation quality across 203 languages and varying annotation complexity, finding consistent improvement attributed to additional fine-tuning. Finally, across 27 languages and three downstream tasks, we report substantial gains in cross-lingual transfer over comparable work, up to +39.9 F1 on NER. Overall, our results demonstrate that XML-tagged label projection provides effective and efficient label transfer without compromising translation quality.

Just Use XML: Revisiting Joint Translation and Label Projection

Abstract

Label projection is an effective technique for cross-lingual transfer, extending span-annotated datasets from a high-resource language to low-resource ones. Most approaches perform label projection as a separate step after machine translation, and prior work that combines the two reports degraded translation quality. We re-evaluate this claim with LabelPigeon, a novel framework that jointly performs translation and label projection via XML tags. We design a direct evaluation scheme for label projection, and find that LabelPigeon outperforms baselines and actively improves translation quality in 11 languages. We further assess translation quality across 203 languages and varying annotation complexity, finding consistent improvement attributed to additional fine-tuning. Finally, across 27 languages and three downstream tasks, we report substantial gains in cross-lingual transfer over comparable work, up to +39.9 F1 on NER. Overall, our results demonstrate that XML-tagged label projection provides effective and efficient label transfer without compromising translation quality.
Paper Structure (39 sections, 4 figures, 11 tables)

This paper contains 39 sections, 4 figures, 11 tables.

Figures (4)

  • Figure 1: An example taken from XQuAD artetxeCrosslingualTransferabilityMonolingual2020, where LabelPigeon accurately and seamlessly handles translating English to German while transferring 7 labeled spans with nesting.
  • Figure 2: Examples of labeled English sentences with two equally valid translations, where the labeled span is preserved in one and split, omitted, or ambiguous in the other.
  • Figure 3: An example showcasing the tag swap that we conduct on training data in order to make it generally applicable.
  • Figure 4: Translation performance of our model on Flores-200 as measured by chrF++ across different values of $P_{close}$ and $P_{open}$ under the Complex marker insertion scheme.