Revisiting Classification Taxonomy for Grammatical Errors
Deqing Zou, Jingheng Ye, Yulu Liu, Yu Wu, Zishan Xu, Yinghui Li, Hai-Tao Zheng, Bingxu An, Zhao Wei, Yong Xu
TL;DR
This paper tackles the lack of rigorous validation in grammatical error classification taxonomies used by language learning systems. It proposes a multi-metric evaluation framework spanning $Exclusivity$, $Coverage$, $Balance$, and $Usability$, and applies it to a high-quality dataset annotated with four taxonomies: POL73, TUC74, BRY17, and FEI23, via an LLM–human collaborative process. The experiments reveal trade-offs among taxonomies (e.g., high $Exclusivity$ and $Coverage$ can come at the expense of $Balance$ and $Usability$), and show that merging categories can increase $Coverage$ but reduce $Balance$ and alter $Exclusivity$. The work provides practical guidance for error analysis in language learning and motivates more rigorous, model-aware taxonomy design.
Abstract
Grammatical error classification plays a crucial role in language learning systems, but existing classification taxonomies often lack rigorous validation, leading to inconsistencies and unreliable feedback. In this paper, we revisit previous classification taxonomies for grammatical errors by introducing a systematic and qualitative evaluation framework. Our approach examines four aspects of a taxonomy, i.e., exclusivity, coverage, balance, and usability. Then, we construct a high-quality grammatical error classification dataset annotated with multiple classification taxonomies and evaluate them grounding on our proposed evaluation framework. Our experiments reveal the drawbacks of existing taxonomies. Our contributions aim to improve the precision and effectiveness of error analysis, providing more understandable and actionable feedback for language learners.
