Table of Contents
Fetching ...

Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction

Yinghui Li, Shang Qin, Haojing Huang, Yangning Li, Libo Qin, Xuming Hu, Wenhao Jiang, Hai-Tao Zheng, Philip S. Yu

TL;DR

This work addresses CGEC by rethinking how LLMs can contribute beyond direct correction. It introduces EXAM, which uses LLMs as explainers to augment training of small CGEC models through error types, reference corrections, and explanations, and SEE, which uses LLMs as evaluators to provide semantically informed, flexible edits—moving beyond the traditional minimum-change constraint. Empirical results on NLPCC and NaCGEC show EXAM consistently improves baselines, even with limited training data, and SEE yields evaluation that aligns more closely with human judgments than traditional metrics, highlighting effective collaboration between LLMs and small models. The study demonstrates practical benefits for CGEC by reducing reliance on large-scale LLMs, improving cost and latency, and offering a robust evaluation framework that accounts for semantics and subjectivity in grammatical error correction.

Abstract

Recently, Large Language Models (LLMs) have been widely studied by researchers for their roles in various downstream NLP tasks. As a fundamental task in the NLP field, Chinese Grammatical Error Correction (CGEC) aims to correct all potential grammatical errors in the input sentences. Previous studies have shown that LLMs' performance as correctors on CGEC remains unsatisfactory due to its challenging task focus. To promote the CGEC field to better adapt to the era of LLMs, we rethink the roles of LLMs in the CGEC task so that they can be better utilized and explored in CGEC. Considering the rich grammatical knowledge stored in LLMs and their powerful semantic understanding capabilities, we utilize LLMs as explainers to provide explanation information for the CGEC small models during error correction to enhance performance. We also use LLMs as evaluators to bring more reasonable CGEC evaluations, thus alleviating the troubles caused by the subjectivity of the CGEC task. In particular, our work is also an active exploration of how LLMs and small models better collaborate in downstream tasks. Extensive experiments and detailed analyses on widely used datasets verify the effectiveness of our thinking intuition and the proposed methods.

Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction

TL;DR

This work addresses CGEC by rethinking how LLMs can contribute beyond direct correction. It introduces EXAM, which uses LLMs as explainers to augment training of small CGEC models through error types, reference corrections, and explanations, and SEE, which uses LLMs as evaluators to provide semantically informed, flexible edits—moving beyond the traditional minimum-change constraint. Empirical results on NLPCC and NaCGEC show EXAM consistently improves baselines, even with limited training data, and SEE yields evaluation that aligns more closely with human judgments than traditional metrics, highlighting effective collaboration between LLMs and small models. The study demonstrates practical benefits for CGEC by reducing reliance on large-scale LLMs, improving cost and latency, and offering a robust evaluation framework that accounts for semantics and subjectivity in grammatical error correction.

Abstract

Recently, Large Language Models (LLMs) have been widely studied by researchers for their roles in various downstream NLP tasks. As a fundamental task in the NLP field, Chinese Grammatical Error Correction (CGEC) aims to correct all potential grammatical errors in the input sentences. Previous studies have shown that LLMs' performance as correctors on CGEC remains unsatisfactory due to its challenging task focus. To promote the CGEC field to better adapt to the era of LLMs, we rethink the roles of LLMs in the CGEC task so that they can be better utilized and explored in CGEC. Considering the rich grammatical knowledge stored in LLMs and their powerful semantic understanding capabilities, we utilize LLMs as explainers to provide explanation information for the CGEC small models during error correction to enhance performance. We also use LLMs as evaluators to bring more reasonable CGEC evaluations, thus alleviating the troubles caused by the subjectivity of the CGEC task. In particular, our work is also an active exploration of how LLMs and small models better collaborate in downstream tasks. Extensive experiments and detailed analyses on widely used datasets verify the effectiveness of our thinking intuition and the proposed methods.
Paper Structure (35 sections, 3 equations, 8 figures, 6 tables)

This paper contains 35 sections, 3 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: The example of subjectivity and explainability of CGEC. The explanation is produced by ChatGPT.
  • Figure 2: Our designed frameworks of EXAM and SEE.
  • Figure 3: The comparison examples of evaluation.
  • Figure 4: Human evaluation results. The training data is 15K sampled HSK data. The test data is 200 sampled NLPCC data. The traditional metric is Char-$\text{F}_{0.5}$.
  • Figure 5: Few-shot results of LLMs on the word-level metric.
  • ...and 3 more figures