Table of Contents
Fetching ...

Natural Language Processing for Tigrinya: Current State and Future Directions

Fitsum Gaim, Jong C. Park

TL;DR

Tigrinya NLP has long faced data scarcity and complex morphology, limiting substantive progress. This survey analyzes 50+ studies across 15 tasks (2011–2025), tracing a clear shift from rule-based systems to neural architectures enabled by gradually expanding resources and benchmarks. It highlights milestone datasets, monolingual and multilingual language models, and cross-lingual transfer as key drivers, while also documenting persistent challenges in data, tools, and bias. The work provides a replicable methodology and a practical roadmap for advancing Tigrinya NLP through community engagement, open resource development, and targeted morphology-aware and cross-lingual approaches.

Abstract

Despite being spoken by millions of people, Tigrinya remains severely underrepresented in Natural Language Processing (NLP) research. This work presents a comprehensive survey of NLP research for Tigrinya, analyzing over 50 studies from 2011 to 2025. We systematically review the current state of computational resources, models, and applications across fifteen downstream tasks, including morphological processing, part-of-speech tagging, named entity recognition, machine translation, question-answering, speech recognition, and synthesis. Our analysis reveals a clear trajectory from foundational, rule-based systems to modern neural architectures, with progress consistently driven by milestones in resource creation. We identify key challenges rooted in Tigrinya's morphological properties and resource scarcity, and highlight promising research directions, including morphology-aware modeling, cross-lingual transfer, and community-centered resource development. This work serves both as a reference for researchers and as a roadmap for advancing Tigrinya NLP. An anthology of surveyed studies and resources is publicly available.

Natural Language Processing for Tigrinya: Current State and Future Directions

TL;DR

Tigrinya NLP has long faced data scarcity and complex morphology, limiting substantive progress. This survey analyzes 50+ studies across 15 tasks (2011–2025), tracing a clear shift from rule-based systems to neural architectures enabled by gradually expanding resources and benchmarks. It highlights milestone datasets, monolingual and multilingual language models, and cross-lingual transfer as key drivers, while also documenting persistent challenges in data, tools, and bias. The work provides a replicable methodology and a practical roadmap for advancing Tigrinya NLP through community engagement, open resource development, and targeted morphology-aware and cross-lingual approaches.

Abstract

Despite being spoken by millions of people, Tigrinya remains severely underrepresented in Natural Language Processing (NLP) research. This work presents a comprehensive survey of NLP research for Tigrinya, analyzing over 50 studies from 2011 to 2025. We systematically review the current state of computational resources, models, and applications across fifteen downstream tasks, including morphological processing, part-of-speech tagging, named entity recognition, machine translation, question-answering, speech recognition, and synthesis. Our analysis reveals a clear trajectory from foundational, rule-based systems to modern neural architectures, with progress consistently driven by milestones in resource creation. We identify key challenges rooted in Tigrinya's morphological properties and resource scarcity, and highlight promising research directions, including morphology-aware modeling, cross-lingual transfer, and community-centered resource development. This work serves both as a reference for researchers and as a roadmap for advancing Tigrinya NLP. An anthology of surveyed studies and resources is publicly available.

Paper Structure

This paper contains 36 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Timeline and Distribution of Tigrinya NLP Research by Task Area (2011-2025). The number of publications in a year is indicated by the bubble size.