Construction and educational application of a linguistically grounded dependency treebank for Uyghur

Jiaxin Zuo; Yiquan Wang; Yuan Pan; Xiadiya Yibulayin

Construction and educational application of a linguistically grounded dependency treebank for Uyghur

Jiaxin Zuo, Yiquan Wang, Yuan Pan, Xiadiya Yibulayin

TL;DR

Uyghur’s agglutinative morphology and frequent zero copula present challenges for education-focused NLP under Universal Dependencies. The authors propose MUDT, a linguistically grounded dependency framework with a four-layer morphological decomposition and targeted dependency relations (zero copula, postpositional head, and compound predicates) built via a hybrid AI–human pipeline on 3,456 sentences. Intrinsic and extrinsic evaluations show substantial reductions in non-projectivity and improved parsing performance, while a prototype AI-assisted grammar tutor demonstrates significant learning gains (mean gain $13.73$ vs $7.88$, $p=0.018$, $d=0.90$). The work shows that preserving fine-grained morphosyntactic information yields pedagogically actionable feedback and stronger educational outcomes for low-resource languages, with data and code available for replication.

Abstract

Developing effective educational technologies for low-resource agglutinative languages like Uyghur is often hindered by the mismatch between existing annotation frameworks and specific grammatical structures. To address this challenge, this study introduces the Modern Uyghur Dependency Treebank (MUDT), a linguistically grounded annotation framework specifically designed to capture the agglutinative complexity of Uyghur, including zero copula constructions and fine-grained case marking. Utilizing a hybrid pipeline that combines Large Language Model pre-annotation with rigorous human correction, a high-quality treebank consisting of 3,456 sentences was constructed. Intrinsic structural evaluation reveals that MUDT significantly improves dependency projectivity by reducing the crossing-arc rate from 7.35\% in the Universal Dependencies standard to 0.06\%. Extrinsic parsing experiments using UDPipe and Stanza further demonstrate that models trained on MUDT achieve superior in-domain accuracy and cross-domain generalization compared to UD-based baselines. To validate the practical utility of this computational resource, an AI-assisted grammar tutoring system was developed to translate MUDT-based syntactic analyses into interpretable pedagogical feedback. A controlled experiment involving 35 second-language learners indicated that students receiving syntax-aware feedback achieved significantly higher learning gains compared to those in a control group. These findings establish MUDT as a robust foundation for syntactic analysis and underscore the critical role of linguistically informed natural language processing resources in bridging the gap between computational models and the cognitive needs of second-language learners.

Construction and educational application of a linguistically grounded dependency treebank for Uyghur

TL;DR

Abstract

Construction and educational application of a linguistically grounded dependency treebank for Uyghur

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)