Generative AI and Large Language Models in Language Preservation: Opportunities and Challenges

Vincent Koc

Generative AI and Large Language Models in Language Preservation: Opportunities and Challenges

Vincent Koc

TL;DR

Generative AI and LLMs offer transformative potential for preserving endangered languages, but their deployment risks data sovereignty, bias, and cultural misrepresentation. The paper proposes an analytical framework that aligns GenAI capabilities with language communities' needs, embedding ethical safeguards and governance. Te Reo Māori serves as a detailed worked example, illustrating high-accuracy ASR outcomes and actionable strategies for community-led development. The authors introduce the ImpactScore rubric to guide responsible intervention prioritization, and discuss future research directions to advance low-resource learning, explainability, and culturally resonant metrics.

Abstract

The global crisis of language endangerment meets a technological turning point as Generative AI (GenAI) and Large Language Models (LLMs) unlock new frontiers in automating corpus creation, transcription, translation, and tutoring. However, this promise is imperiled by fragmented practices and the critical lack of a methodology to navigate the fraught balance between LLM capabilities and the profound risks of data scarcity, cultural misappropriation, and ethical missteps. This paper introduces a novel analytical framework that systematically evaluates GenAI applications against language-specific needs, embedding community governance and ethical safeguards as foundational pillars. We demonstrate its efficacy through the Te Reo Māori revitalization, where it illuminates successes, such as community-led Automatic Speech Recognition achieving 92% accuracy, while critically surfacing persistent challenges in data sovereignty and model bias for digital archives and educational tools. Our findings underscore that GenAI can indeed revolutionize language preservation, but only when interventions are rigorously anchored in community-centric data stewardship, continuous evaluation, and transparent risk management. Ultimately, this framework provides an indispensable toolkit for researchers, language communities, and policymakers, aiming to catalyze the ethical and high-impact deployment of LLMs to safeguard the world's linguistic heritage.

Generative AI and Large Language Models in Language Preservation: Opportunities and Challenges

TL;DR

Abstract

Paper Structure (25 sections, 2 equations, 3 figures, 2 tables)

This paper contains 25 sections, 2 equations, 3 figures, 2 tables.

Introduction
Background
Methodology
Literature Review and Synthesis
Framework for Opportunity and Challenge Identification
Ethical and Cultural Impact Assessment
Case Study Analysis
Synthesis and Guideline Formulation
Framework Application: A Worked Example with Te Reo Māori
Applying Generative AI in Language Preservation: Analysis and Insights
Opportunities in AI-driven Language Preservation
Challenges in AI-driven Language Preservation
Ethical and Cultural Dimensions
Proposed ImpactScore Rubric for Intervention Assessment
Conclusion and Future Work
...and 10 more sections

Figures (3)

Figure 1: Taxonomy of Opportunities and Challenges in Applying Generative AI to Language Preservation.
Figure 2: The proposed analytical framework detailing inputs, core processes (numbered 1 to 3 to visually echo the text), a functional mapping summary, and outputs for assessing GenAI applications in language preservation.
Figure 3: A human-centered framework for GenAI initiatives, illustrating the dual phases of problem identification and solution implementation, with iterative feedback between readiness, strategy, use case discovery, operating model, infrastructure, and awareness. Adapted and redesigned based on PwC pwc2025.

Generative AI and Large Language Models in Language Preservation: Opportunities and Challenges

TL;DR

Abstract

Generative AI and Large Language Models in Language Preservation: Opportunities and Challenges

Authors

TL;DR

Abstract

Table of Contents

Figures (3)