Table of Contents
Fetching ...

CommonMorph: Participatory Morphological Documentation Platform

Aso Mahmudi, Sina Ahmadi, Kemal Kurniawan, Rico Sennrich, Eduard Hovy, Ekaterina Vylomova

Abstract

Collecting and annotating morphological data present significant challenges, requiring linguistic expertise, methodological rigour, and substantial resources. These barriers are particularly acute for low-resource languages and varieties. To accelerate this process, we introduce \texttt{CommonMorph}, a comprehensive platform that streamlines morphological data collection development through a three-tiered approach: expert linguistic definition, contributor elicitation, and community validation. The platform minimises manual work by incorporating active learning, annotation suggestions, and tools to import and adapt materials from related languages. It accommodates diverse morphological systems, including fusional, agglutinative, and root-and-pattern morphologies. Its open-source design and UniMorph-compatible outputs ensure accessibility and interoperability with NLP tools. Our platform is accessible at https://common-morph.com, offering a replicable model for preserving linguistic diversity through collaborative technology.

CommonMorph: Participatory Morphological Documentation Platform

Abstract

Collecting and annotating morphological data present significant challenges, requiring linguistic expertise, methodological rigour, and substantial resources. These barriers are particularly acute for low-resource languages and varieties. To accelerate this process, we introduce \texttt{CommonMorph}, a comprehensive platform that streamlines morphological data collection development through a three-tiered approach: expert linguistic definition, contributor elicitation, and community validation. The platform minimises manual work by incorporating active learning, annotation suggestions, and tools to import and adapt materials from related languages. It accommodates diverse morphological systems, including fusional, agglutinative, and root-and-pattern morphologies. Its open-source design and UniMorph-compatible outputs ensure accessibility and interoperability with NLP tools. Our platform is accessible at https://common-morph.com, offering a replicable model for preserving linguistic diversity through collaborative technology.

Paper Structure

This paper contains 19 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The CommonMorph platform workflow facilitates elaboration of morphological structures by a linguist and provides an interoperable ecosystem for contributors to validate and enrich labelled databases.
  • Figure 2: Linguists define (a) paradigm structures, (b) additional structural layers such as agreement, and (c) lexical entries. These components allow the generation of morphological forms that can be validated during elicitation. Linguists also create elicitation prompts (d), taking inspiration from examples produced by the system.
  • Figure 3: Example of a Latin verb conjugation table annotated with CommonMorph terminology: (1) An inflection class and (2) a paradigm structure within that inflection class; (3) a reusable layer contains several (4) reusable morphemes and their (5) morphosyntactic features; (6) a lemma and its (7) gloss and (8–9) different stems.
  • Figure 4: Screenshots from the speaker interface.