Enhancing Korean Dependency Parsing with Morphosyntactic Features
Jungyeul Park, Yige Chen, Kyuwon Kim, KyungTae Lim, Chulwoo Park
TL;DR
This work addresses the challenge of processing morphologically rich Korean by integrating UD syntax with UniMorph morphology through the UniDive framework. It develops a Korean UniDive dataset that aligns K-UD and K-UniMorph, and implements a rule-based morphosyntactic feature extraction pipeline to enrich tokens with detailed features. Through experiments with encoder-only (UDPipe) and decoder-only (instruction-tuned LLM) parsers, the authors show that explicit morphosyntactic features substantially improve parsing accuracy, especially for the encoder-based model, and that the approach extends to cross-linguistic contexts such as Turkish. The work demonstrates the practical impact of unified morphosyntactic representations for parsing Korean and provides a scalable blueprint for morphologically informed parsing in other agglutinative languages.
Abstract
This paper introduces UniDive for Korean, an integrated framework that bridges Universal Dependencies (UD) and Universal Morphology (UniMorph) to enhance the representation and processing of Korean {morphosyntax}. Korean's rich inflectional morphology and flexible word order pose challenges for existing frameworks, which often treat morphology and syntax separately, leading to inconsistencies in linguistic analysis. UniDive unifies syntactic and morphological annotations by preserving syntactic dependencies while incorporating UniMorph-derived features, improving consistency in annotation. We construct an integrated dataset and apply it to dependency parsing, demonstrating that enriched morphosyntactic features enhance parsing accuracy, particularly in distinguishing grammatical relations influenced by morphology. Our experiments, conducted with both encoder-only and decoder-only models, confirm that explicit morphological information contributes to more accurate syntactic analysis.
