Table of Contents
Fetching ...

Cross-lingual, Character-Level Neural Morphological Tagging

Ryan Cotterell, Georg Heigold

TL;DR

This work addresses the scarcity of supervised data for morphological tagging in many languages by proposing a cross-lingual, character-level neural transfer framework. It casts each language as a task in a multi-task learning setting and enforces shared character representations across related languages, exploring three architectures: language-universal softmax, language-specific softmax, and a joint tagging-plus-language-identification model. Across 18 languages from Romance, Germanic, Slavic, and Uralic families, the approach transfers morphology knowledge from high-resource to low-resource languages, outperforming alignment-based projection and MarMoT baselines and sometimes achieving gains up to 2–3 percentage points with multi-source transfer. The results demonstrate that transfer quality correlates with linguistic relatedness and that multi-source transfer can further improve performance, offering a viable path for deploying high-quality morphological taggers in low-resource settings with modest target-language supervision.

Abstract

Even for common NLP tasks, sufficient supervision is not available in many languages -- morphological tagging is no exception. In the work presented here, we explore a transfer learning scheme, whereby we train character-level recurrent neural taggers to predict morphological taggings for high-resource languages and low-resource languages together. Learning joint character representations among multiple related languages successfully enables knowledge transfer from the high-resource languages to the low-resource ones, improving accuracy by up to 30% over a monolingual model.

Cross-lingual, Character-Level Neural Morphological Tagging

TL;DR

This work addresses the scarcity of supervised data for morphological tagging in many languages by proposing a cross-lingual, character-level neural transfer framework. It casts each language as a task in a multi-task learning setting and enforces shared character representations across related languages, exploring three architectures: language-universal softmax, language-specific softmax, and a joint tagging-plus-language-identification model. Across 18 languages from Romance, Germanic, Slavic, and Uralic families, the approach transfers morphology knowledge from high-resource to low-resource languages, outperforming alignment-based projection and MarMoT baselines and sometimes achieving gains up to 2–3 percentage points with multi-source transfer. The results demonstrate that transfer quality correlates with linguistic relatedness and that multi-source transfer can further improve performance, offering a viable path for deploying high-quality morphological taggers in low-resource settings with modest target-language supervision.

Abstract

Even for common NLP tasks, sufficient supervision is not available in many languages -- morphological tagging is no exception. In the work presented here, we explore a transfer learning scheme, whereby we train character-level recurrent neural taggers to predict morphological taggings for high-resource languages and low-resource languages together. Learning joint character representations among multiple related languages successfully enables knowledge transfer from the high-resource languages to the low-resource ones, improving accuracy by up to 30% over a monolingual model.

Paper Structure

This paper contains 31 sections, 10 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Example of a morphologically tagged sentence in Russian using the annotation scheme provided in the UD dataset.
  • Figure 2: We depict four subarchitectures used in the models we develop in this work. Combining (a) with the character representations in (c) gives the vanilla morphological tagging architecture of heigold2017. Combining (a) with (d) yields the language-universal softmax architecture and (b) and (c) yields our joint model for language identification and tagging.
  • Figure 3: A learning curve for Spanish and Catalan comparing our monolingual model, our joint model, and two MarMoT models. The first MarMoT model is identical to those trained in the rest of the paper and the second shows a multi-task approach, which failed so no further experimentation was performed with this model.