A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language Models
Houquan Zhou, Zhenghua Li, Bo Zhang, Chen Li, Shaopeng Lai, Ji Zhang, Fei Huang, Min Zhang
TL;DR
This work tackles Chinese spelling correction by eliminating the need for task specific prompts or model fine tuning. It combines a minimal distortion model that captures pronunciation and glyph based errors with a pure language model probability from an LLM to guide corrections, while introducing a length reward and a faithfulness reward to balance output fluency and fidelity to the input. The approach is evaluated across five public CSC datasets, showing strong cross domain generalization and competitive performance with domain general SOTAs, while outperforming prompt based baselines and even approaching SOTA on several domains. The results demonstrate the practicality and scalability of training free, prompt free CSC with LLMs, and point to the potential for applying similar strategies to other languages and error correction tasks, with considerations for computational cost and knowledge injection through input prefixes.
Abstract
This work proposes a simple training-free prompt-free approach to leverage large language models (LLMs) for the Chinese spelling correction (CSC) task, which is totally different from all previous CSC approaches. The key idea is to use an LLM as a pure language model in a conventional manner. The LLM goes through the input sentence from the beginning, and at each inference step, produces a distribution over its vocabulary for deciding the next token, given a partial sentence. To ensure that the output sentence remains faithful to the input sentence, we design a minimal distortion model that utilizes pronunciation or shape similarities between the original and replaced characters. Furthermore, we propose two useful reward strategies to address practical challenges specific to the CSC task. Experiments on five public datasets demonstrate that our approach significantly improves LLM performance, enabling them to compete with state-of-the-art domain-general CSC models.
