Table of Contents
Fetching ...

Revisiting Classification Taxonomy for Grammatical Errors

Deqing Zou, Jingheng Ye, Yulu Liu, Yu Wu, Zishan Xu, Yinghui Li, Hai-Tao Zheng, Bingxu An, Zhao Wei, Yong Xu

TL;DR

This paper tackles the lack of rigorous validation in grammatical error classification taxonomies used by language learning systems. It proposes a multi-metric evaluation framework spanning $Exclusivity$, $Coverage$, $Balance$, and $Usability$, and applies it to a high-quality dataset annotated with four taxonomies: POL73, TUC74, BRY17, and FEI23, via an LLM–human collaborative process. The experiments reveal trade-offs among taxonomies (e.g., high $Exclusivity$ and $Coverage$ can come at the expense of $Balance$ and $Usability$), and show that merging categories can increase $Coverage$ but reduce $Balance$ and alter $Exclusivity$. The work provides practical guidance for error analysis in language learning and motivates more rigorous, model-aware taxonomy design.

Abstract

Grammatical error classification plays a crucial role in language learning systems, but existing classification taxonomies often lack rigorous validation, leading to inconsistencies and unreliable feedback. In this paper, we revisit previous classification taxonomies for grammatical errors by introducing a systematic and qualitative evaluation framework. Our approach examines four aspects of a taxonomy, i.e., exclusivity, coverage, balance, and usability. Then, we construct a high-quality grammatical error classification dataset annotated with multiple classification taxonomies and evaluate them grounding on our proposed evaluation framework. Our experiments reveal the drawbacks of existing taxonomies. Our contributions aim to improve the precision and effectiveness of error analysis, providing more understandable and actionable feedback for language learners.

Revisiting Classification Taxonomy for Grammatical Errors

TL;DR

This paper tackles the lack of rigorous validation in grammatical error classification taxonomies used by language learning systems. It proposes a multi-metric evaluation framework spanning , , , and , and applies it to a high-quality dataset annotated with four taxonomies: POL73, TUC74, BRY17, and FEI23, via an LLM–human collaborative process. The experiments reveal trade-offs among taxonomies (e.g., high and can come at the expense of and ), and show that merging categories can increase but reduce and alter . The work provides practical guidance for error analysis in language learning and motivates more rigorous, model-aware taxonomy design.

Abstract

Grammatical error classification plays a crucial role in language learning systems, but existing classification taxonomies often lack rigorous validation, leading to inconsistencies and unreliable feedback. In this paper, we revisit previous classification taxonomies for grammatical errors by introducing a systematic and qualitative evaluation framework. Our approach examines four aspects of a taxonomy, i.e., exclusivity, coverage, balance, and usability. Then, we construct a high-quality grammatical error classification dataset annotated with multiple classification taxonomies and evaluate them grounding on our proposed evaluation framework. Our experiments reveal the drawbacks of existing taxonomies. Our contributions aim to improve the precision and effectiveness of error analysis, providing more understandable and actionable feedback for language learners.

Paper Structure

This paper contains 36 sections, 7 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Overview of the POL73 Error Classification Taxonomy. The vertical ellipsis indicates that the category has additional subcategories not fully expanded here.
  • Figure 2: Overview of the TUC74 Error Classification Taxonomy. The horizontal ellipsis indicates that the category has additional subcategories not fully expanded here.
  • Figure 3: Overview of the BRY17 Error Classification Taxonomy
  • Figure 4: Overview of the FEI23 Error Classification Taxonomy