Chinese Spelling Correction: A Comprehensive Survey of Progress, Challenges, and Opportunities
Changchun Liu, Kai Zhang, Junzhe Jiang, Zixiao Kong, Qi Liu, Enhong Chen
TL;DR
This survey systematically analyzes Chinese Spelling Correction (CSC) from rule-based and statistical beginnings to modern PLMs and emerging LLMs, formalizing the task and detailing architectural families (Information-Learning and Detector-Corrector). It surveys how phonetic (pinyin) and visual (glyph) character information, along with confusion sets, are learned and integrated, and reviews key datasets and evaluation criteria used for benchmarking. The paper highlights persistent challenges in PLMs (overcorrection, generalization, consecutive errors), LLMs (length control, overcorrection, phonetic reasoning), and dataset quality, proposing future directions that leverage alignment, retrieval-augmented approaches, and cross-domain data to enhance CSC performance. Overall, it provides a comprehensive roadmap for researchers to advance CSC, particularly by exploiting LLM reasoning and domain-adaptive data to achieve robust, scalable corrections across diverse Chinese text domains.
Abstract
Chinese Spelling Correction (CSC) is a critical task in natural language processing, aimed at detecting and correcting spelling errors in Chinese text. This survey provides a comprehensive overview of CSC, tracing its evolution from pre-trained language models to large language models, and critically analyzing their respective strengths and weaknesses in this domain. Moreover, we further present a detailed examination of existing benchmark datasets, highlighting their inherent challenges and limitations. Finally, we propose promising future research directions, particularly focusing on leveraging the potential of LLMs and their reasoning capabilities for improved CSC performance. To the best of our knowledge, this is the first comprehensive survey dedicated to the field of CSC. We believe this work will serve as a valuable resource for researchers, fostering a deeper understanding of the field and inspiring future advancements.
