Table of Contents
Fetching ...

Recent advancements in computational morphology : A comprehensive survey

Jatayu Baxi, Brijesh Bhatt

TL;DR

This survey assesses computational morphology from traditional rule-based methods (Two-Level Morphology, FSTs, paradigms, and stemmers) to data-driven machine learning and contemporary Transformer-based models. It catalogs unsupervised and supervised learning approaches, dives into deep architectures for segmentation and tagging, and highlights joint lemma–MSD modeling and cross-lingual transfer for low-resource languages. The paper also reviews major morphology datasets (UD Treebanks, UniMorph) and analyzes the relative strengths, data requirements, and applicability across languages. It concludes with open research issues, including data scarcity, interpretability, and the need for broad, multilingual resources to advance robust morphology tooling.

Abstract

Computational morphology handles the language processing at the word level. It is one of the foundational tasks in the NLP pipeline for the development of higher level NLP applications. It mainly deals with the processing of words and word forms. Computational Morphology addresses various sub problems such as morpheme boundary detection, lemmatization, morphological feature tagging, morphological reinflection etc. In this paper, we present exhaustive survey of the methods for developing computational morphology related tools. We survey the literature in the chronological order starting from the conventional methods till the recent evolution of deep neural network based approaches. We also review the existing datasets available for this task across the languages. We discuss about the effectiveness of neural model compared with the traditional models and present some unique challenges associated with building the computational morphology tools. We conclude by discussing some recent and open research issues in this field.

Recent advancements in computational morphology : A comprehensive survey

TL;DR

This survey assesses computational morphology from traditional rule-based methods (Two-Level Morphology, FSTs, paradigms, and stemmers) to data-driven machine learning and contemporary Transformer-based models. It catalogs unsupervised and supervised learning approaches, dives into deep architectures for segmentation and tagging, and highlights joint lemma–MSD modeling and cross-lingual transfer for low-resource languages. The paper also reviews major morphology datasets (UD Treebanks, UniMorph) and analyzes the relative strengths, data requirements, and applicability across languages. It concludes with open research issues, including data scarcity, interpretability, and the need for broad, multilingual resources to advance robust morphology tooling.

Abstract

Computational morphology handles the language processing at the word level. It is one of the foundational tasks in the NLP pipeline for the development of higher level NLP applications. It mainly deals with the processing of words and word forms. Computational Morphology addresses various sub problems such as morpheme boundary detection, lemmatization, morphological feature tagging, morphological reinflection etc. In this paper, we present exhaustive survey of the methods for developing computational morphology related tools. We survey the literature in the chronological order starting from the conventional methods till the recent evolution of deep neural network based approaches. We also review the existing datasets available for this task across the languages. We discuss about the effectiveness of neural model compared with the traditional models and present some unique challenges associated with building the computational morphology tools. We conclude by discussing some recent and open research issues in this field.
Paper Structure (23 sections, 11 figures, 6 tables)

This paper contains 23 sections, 11 figures, 6 tables.

Figures (11)

  • Figure 1: (a) Morphological analyzer which produces lemma and MSD tags for the given inflected word. (b) Morph generator produces inflected form when the lemma and MSD tags are given as input ( Source : https://doi.org/10.48550/arxiv.2105.09404)
  • Figure 2: A morph analyzer-generator can be implemented by means of finite-state transducer. As described in Xerox convention, the analysis of the lower side language is done which is of surface string and it is represented by upper-side language. This transducer represents a data structure and the working of this structure in either direction is language-independent.(Source beesley1998arabic)
  • Figure 3: Information processing in paradigm based approach ( Source : Dash2021 )
  • Figure 4: Supervised morphological analyzer ( source : mokanarangan2016tamil)
  • Figure 5: Different neural network learning scenarios ( Source : https://doi.org/10.48550/arxiv.2105.09404)
  • ...and 6 more figures