TAMS: Translation-Assisted Morphological Segmentation

Enora Rice; Ali Marashian; Luke Gessler; Alexis Palmer; Katharina von der Wense

TAMS: Translation-Assisted Morphological Segmentation

Enora Rice, Ali Marashian, Luke Gessler, Alexis Palmer, Katharina von der Wense

TL;DR

This work proposes a character-level sequence-to-sequence model that incorporates representations of translations obtained from pretrained high-resource monolingual language models as an additional signal and shows promise in severely resource-constrained settings.

Abstract

Canonical morphological segmentation is the process of analyzing words into the standard (aka underlying) forms of their constituent morphemes. This is a core task in language documentation, and NLP systems have the potential to dramatically speed up this process. But in typical language documentation settings, training data for canonical morpheme segmentation is scarce, making it difficult to train high quality models. However, translation data is often much more abundant, and, in this work, we present a method that attempts to leverage this data in the canonical segmentation task. We propose a character-level sequence-to-sequence model that incorporates representations of translations obtained from pretrained high-resource monolingual language models as an additional signal. Our model outperforms the baseline in a super-low resource setting but yields mixed results on training splits with more data. While further work is needed to make translations useful in higher-resource settings, our model shows promise in severely resource-constrained settings.

TAMS: Translation-Assisted Morphological Segmentation

TL;DR

Abstract

Paper Structure (38 sections, 5 equations, 2 figures, 13 tables)

This paper contains 38 sections, 5 equations, 2 figures, 13 tables.

Introduction
Related Work
Modeling Morphological Segmentation
Morphological Information within Embeddings
Incorporating Translations into Morphological Segmentation Models
Encoder--Decoder Networks
Translation Assistance
Alignment
Translation Representation
Incorporation Strategies
Data
Languages
Tsez
Lezgi
Arapaho
...and 23 more sections

Figures (2)

Figure 1: Canonical segmentation of the English word "Cylindrically"
Figure 2: Manual Word Alignment of Tsez IGT: Now the boy went home

TAMS: Translation-Assisted Morphological Segmentation

TL;DR

Abstract

TAMS: Translation-Assisted Morphological Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)