On the (In)Effectiveness of Large Language Models for Chinese Text Correction

Yinghui Li; Haojing Huang; Shirong Ma; Yong Jiang; Yangning Li; Feng Zhou; Hai-Tao Zheng; Qingyu Zhou

On the (In)Effectiveness of Large Language Models for Chinese Text Correction

Yinghui Li, Haojing Huang, Shirong Ma, Yong Jiang, Yangning Li, Feng Zhou, Hai-Tao Zheng, Qingyu Zhou

TL;DR

This work empirically finds that the LLMs currently have both amazing performance and unsatisfactory behavior for Chinese Text Correction, and evaluates various representative LLMs on the Chinese Grammatical Error Correction and Chinese Spelling Check tasks.

Abstract

Recently, the development and progress of Large Language Models (LLMs) have amazed the entire Artificial Intelligence community. Benefiting from their emergent abilities, LLMs have attracted more and more researchers to study their capabilities and performance on various downstream Natural Language Processing (NLP) tasks. While marveling at LLMs' incredible performance on all kinds of tasks, we notice that they also have excellent multilingual processing capabilities, such as Chinese. To explore the Chinese processing ability of LLMs, we focus on Chinese Text Correction, a fundamental and challenging Chinese NLP task. Specifically, we evaluate various representative LLMs on the Chinese Grammatical Error Correction (CGEC) and Chinese Spelling Check (CSC) tasks, which are two main Chinese Text Correction scenarios. Additionally, we also fine-tune LLMs for Chinese Text Correction to better observe the potential capabilities of LLMs. From extensive analyses and comparisons with previous state-of-the-art small models, we empirically find that the LLMs currently have both amazing performance and unsatisfactory behavior for Chinese Text Correction. We believe our findings will promote the landing and application of LLMs in the Chinese NLP community.

On the (In)Effectiveness of Large Language Models for Chinese Text Correction

TL;DR

Abstract

Paper Structure (13 sections, 4 figures, 11 tables)

This paper contains 13 sections, 4 figures, 11 tables.

Introduction
Methodology
Task-specific Prompts
In-context Learning Strategies
Supervised Instruction Tuning
Experiments
Experimental Settings
Main Results
Human Evaluation
Analyses and Discussions
Case Study
Related Work
Conclusion

Figures (4)

Figure 1: Task-specific prompts of the CSC (中文拼写纠错) and CGEC (中文语法纠错) tasks. In our study, we try different ChatGPT base models, such as text-davinci-003 and gpt-3.5-turbo, and other LLMs. We mark the key information related to the task characteristics in the prompt in red.
Figure 2: The experiments of in-context learning strategies on the CSC task. We select the correction $F_1$ score to plot the chart. The * means with Select correct and erroneous samples in-context learning strategy. The # means with Select hard erroneous samples in-context learning strategy.
Figure 3: The experiments of in-context learning strategies on the CGEC task. We select the $F_{0.5}$ score to plot the chart. The * means with Select correct and erroneous samples in-context learning strategy. The # means with Select hard erroneous samples in-context learning strategy.
Figure 4: The experiments of how the sentence length impacts the model performance on CSC. We select the correction $F_1$ score to plot the chart.

On the (In)Effectiveness of Large Language Models for Chinese Text Correction

TL;DR

Abstract

On the (In)Effectiveness of Large Language Models for Chinese Text Correction

Authors

TL;DR

Abstract

Table of Contents

Figures (4)