Table of Contents
Fetching ...

NLP Progress in Indigenous Latin American Languages

Atnafu Lambebo Tonja, Fazlourrahman Balouchzahi, Sabur Butt, Olga Kolesnikova, Hector Ceballos, Alexander Gelbukh, Thamar Solorio

TL;DR

Indigenous Latin American languages are underrepresented in NLP, risking loss of cultural heritage. The paper surveys NLP progress in the region via ACL Anthology and complements it with a community-focused survey to identify challenges and needs. Findings show uneven representation across countries and tasks, with machine translation dominating the literature, and a notable acceleration in activity since 2021 through AmericasNLP. The work offers a practical roadmap for researchers and communities to foster inclusive, ethically guided NLP development that supports language preservation and cultural sovereignty.

Abstract

The paper focuses on the marginalization of indigenous language communities in the face of rapid technological advancements. We highlight the cultural richness of these languages and the risk they face of being overlooked in the realm of Natural Language Processing (NLP). We aim to bridge the gap between these communities and researchers, emphasizing the need for inclusive technological advancements that respect indigenous community perspectives. We show the NLP progress of indigenous Latin American languages and the survey that covers the status of indigenous languages in Latin America, their representation in NLP, and the challenges and innovations required for their preservation and development. The paper contributes to the current literature in understanding the need and progress of NLP for indigenous communities of Latin America, specifically low-resource and indigenous communities in general.

NLP Progress in Indigenous Latin American Languages

TL;DR

Indigenous Latin American languages are underrepresented in NLP, risking loss of cultural heritage. The paper surveys NLP progress in the region via ACL Anthology and complements it with a community-focused survey to identify challenges and needs. Findings show uneven representation across countries and tasks, with machine translation dominating the literature, and a notable acceleration in activity since 2021 through AmericasNLP. The work offers a practical roadmap for researchers and communities to foster inclusive, ethically guided NLP development that supports language preservation and cultural sovereignty.

Abstract

The paper focuses on the marginalization of indigenous language communities in the face of rapid technological advancements. We highlight the cultural richness of these languages and the risk they face of being overlooked in the realm of Natural Language Processing (NLP). We aim to bridge the gap between these communities and researchers, emphasizing the need for inclusive technological advancements that respect indigenous community perspectives. We show the NLP progress of indigenous Latin American languages and the survey that covers the status of indigenous languages in Latin America, their representation in NLP, and the challenges and innovations required for their preservation and development. The paper contributes to the current literature in understanding the need and progress of NLP for indigenous communities of Latin America, specifically low-resource and indigenous communities in general.
Paper Structure (17 sections, 7 figures)

This paper contains 17 sections, 7 figures.

Figures (7)

  • Figure 1: Modern Indigenous Languages in Latin America. Source https://adockrill.blogspot.com/2012/05/map-of-contemporary-latin-america.html
  • Figure 2: Number of indigenous languages present in NLP research per country
  • Figure 3: Number of publications per language vs tasks. We did not include publications, tasks, and languages from shared tasks like AmericasNLP in these statistics.
  • Figure 4: Total number of publications per task
  • Figure 5: Publication per year, venues, and publication types
  • ...and 2 more figures