Paradigm Completion for Derivational Morphology
Ryan Cotterell, Ekaterina Vylomova, Huda Khayrallah, Christo Kirov, David Yarowsky
TL;DR
This work addresses the underexplored problem of generating derivational morphology by framing derivation as a paradigm with semantic slots and applying neural sequence-to-sequence models, adapted from inflection generation. It introduces a NomBank-based dataset of 6,029 English derivational triples and evaluates a neural seq2seq model against a baseline transducer using accuracy, edit distance, and affix F1. The neural approach achieves about 71.7% accuracy, beating the baseline by 16.4 percentage points, but remains significantly below inflection-generation performance, indicating additional semantic, historical, and lexical factors need to be modeled. The results demonstrate feasibility and point to future work on data annotation and model enhancements to narrow the gap with inflection generation and to better capture the complexities of derivational morphology.
Abstract
The generation of complex derived word forms has been an overlooked problem in NLP; we fill this gap by applying neural sequence-to-sequence models to the task. We overview the theoretical motivation for a paradigmatic treatment of derivational morphology, and introduce the task of derivational paradigm completion as a parallel to inflectional paradigm completion. State-of-the-art neural models, adapted from the inflection task, are able to learn a range of derivation patterns, and outperform a non-neural baseline by 16.4%. However, due to semantic, historical, and lexical considerations involved in derivational morphology, future work will be needed to achieve performance parity with inflection-generating systems.
