Hybrid Neural-LLM Pipeline for Morphological Glossing in Endangered Language Documentation: A Case Study of Jungar Tuvan

Siyu Liang; Talant Mawkanuli; Gina-Anne Levow

Hybrid Neural-LLM Pipeline for Morphological Glossing in Endangered Language Documentation: A Case Study of Jungar Tuvan

Siyu Liang, Talant Mawkanuli, Gina-Anne Levow

TL;DR

Concrete design principles for integrating structured prediction models with LLM reasoning in morphologically complex fieldwork contexts are established, demonstrating that hybrid architectures offer a promising direction for computationally light solutions to automatic linguistic annotation in endangered language documentation.

Abstract

Interlinear glossed text (IGT) creation remains a major bottleneck in linguistic documentation and fieldwork, particularly for low-resource morphologically rich languages. We present a hybrid automatic glossing pipeline that combines neural sequence labeling with large language model (LLM) post-correction, evaluated on Jungar Tuvan, a low-resource Turkic language. Through systematic ablation studies, we show that retrieval-augmented prompting provides substantial gains over random example selection. We further find that morpheme dictionaries paradoxically hurt performance compared to providing no dictionary at all in most cases, and that performance scales approximately logarithmically with the number of few-shot examples. Most significantly, our two-stage pipeline combining a BiLSTM-CRF model with LLM post-correction yields substantial gains for most models, achieving meaningful reductions in annotation workload. Drawing on these findings, we establish concrete design principles for integrating structured prediction models with LLM reasoning in morphologically complex fieldwork contexts. These principles demonstrate that hybrid architectures offer a promising direction for computationally light solutions to automatic linguistic annotation in endangered language documentation.

Hybrid Neural-LLM Pipeline for Morphological Glossing in Endangered Language Documentation: A Case Study of Jungar Tuvan

TL;DR

Abstract

Paper Structure (36 sections, 1 equation, 4 figures, 5 tables)

This paper contains 36 sections, 1 equation, 4 figures, 5 tables.

Introduction
Related Work
IGT and Language Documentation
Automatic Morphological Analysis and Glossing
LLMs for Linguistic Annotation
Hybrid and Multi-Stage Architectures
Data
Language and Corpus
Data Split
Methodology
Task Formalization
BiLSTM-CRF Model
LLM Configuration and Prompting
Experimental Design
Results
...and 21 more sections

Figures (4)

Figure 1: Hybrid pipeline combining BiLSTM-CRF structured prediction with LLM post-correction using retrieval-augmented prompting.
Figure 2: Experiment 2: n-shot scaling curves for RAG LLM generation. Performance scales approximately logarithmically with example count, plateauing around n=10--15 for most models. The BiLSTM baseline (0.474) is provided in the text for reference.
Figure 3: Experiment 3: glossary ablation results. Partial glossaries (Top-100, Grammatical) hurt performance compared to no glossary, while complete glossaries show modest gains. The negative effect suggests models are usually distracted by morphological information. The BiLSTM baseline (0.474) is provided in the text for reference.
Figure 4: Experiment 4: hybrid pipeline improvement over RAG LLM generation. Solid lines show hybrid accuracy (BiLSTM + LLM correction), dashed lines show pure N-Shot baseline from Experiment 2, and shaded areas indicate improvement. The hybrid approach consistently improves performance across all four models, particularly in low-shot scenarios (n=1--5).

Hybrid Neural-LLM Pipeline for Morphological Glossing in Endangered Language Documentation: A Case Study of Jungar Tuvan

TL;DR

Abstract

Hybrid Neural-LLM Pipeline for Morphological Glossing in Endangered Language Documentation: A Case Study of Jungar Tuvan

Authors

TL;DR

Abstract

Table of Contents

Figures (4)