Syntactic Transfer to Kyrgyz Using the Treebank Translation Method

Anton Alekseev; Alina Tillabaeva; Gulnara Dzh. Kabaeva; Sergey I. Nikolenko

Syntactic Transfer to Kyrgyz Using the Treebank Translation Method

Anton Alekseev, Alina Tillabaeva, Gulnara Dzh. Kabaeva, Sergey I. Nikolenko

TL;DR

This paper addresses the challenge of building high-quality Kyrgyz syntactic corpora by proposing a semi-automatic, cross-lingual transfer method that projects syntactic annotations from Turkish to Kyrgyz using a treebank-translation approach. It implements a pipeline combining Turkish dependency parses, machine translation (including GPT-4o with task-focused prompts), and word-alignment-based annotation projection, with lemmatization via apertium-kir. Evaluations on the TueCL UD Kyrgyz treebank show that this approach yields higher syntactic annotation accuracy than a monolingual model trained on KTMU, and it introduces a method to gauge manual annotation complexity. The work provides a reusable Python package and demonstrates a practical route to rapidly expanding Kyrgyz syntactic resources, with broader implications for other low-resource languages.

Abstract

The Kyrgyz language, as a low-resource language, requires significant effort to create high-quality syntactic corpora. This study proposes an approach to simplify the development process of a syntactic corpus for Kyrgyz. We present a tool for transferring syntactic annotations from Turkish to Kyrgyz based on a treebank translation method. The effectiveness of the proposed tool was evaluated using the TueCL treebank. The results demonstrate that this approach achieves higher syntactic annotation accuracy compared to a monolingual model trained on the Kyrgyz KTMU treebank. Additionally, the study introduces a method for assessing the complexity of manual annotation for the resulting syntactic trees, contributing to further optimization of the annotation process.

Syntactic Transfer to Kyrgyz Using the Treebank Translation Method

TL;DR

Abstract

Syntactic Transfer to Kyrgyz Using the Treebank Translation Method

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)