Table of Contents
Fetching ...

A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language Models

Houquan Zhou, Zhenghua Li, Bo Zhang, Chen Li, Shaopeng Lai, Ji Zhang, Fei Huang, Min Zhang

TL;DR

This work tackles Chinese spelling correction by eliminating the need for task specific prompts or model fine tuning. It combines a minimal distortion model that captures pronunciation and glyph based errors with a pure language model probability from an LLM to guide corrections, while introducing a length reward and a faithfulness reward to balance output fluency and fidelity to the input. The approach is evaluated across five public CSC datasets, showing strong cross domain generalization and competitive performance with domain general SOTAs, while outperforming prompt based baselines and even approaching SOTA on several domains. The results demonstrate the practicality and scalability of training free, prompt free CSC with LLMs, and point to the potential for applying similar strategies to other languages and error correction tasks, with considerations for computational cost and knowledge injection through input prefixes.

Abstract

This work proposes a simple training-free prompt-free approach to leverage large language models (LLMs) for the Chinese spelling correction (CSC) task, which is totally different from all previous CSC approaches. The key idea is to use an LLM as a pure language model in a conventional manner. The LLM goes through the input sentence from the beginning, and at each inference step, produces a distribution over its vocabulary for deciding the next token, given a partial sentence. To ensure that the output sentence remains faithful to the input sentence, we design a minimal distortion model that utilizes pronunciation or shape similarities between the original and replaced characters. Furthermore, we propose two useful reward strategies to address practical challenges specific to the CSC task. Experiments on five public datasets demonstrate that our approach significantly improves LLM performance, enabling them to compete with state-of-the-art domain-general CSC models.

A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language Models

TL;DR

This work tackles Chinese spelling correction by eliminating the need for task specific prompts or model fine tuning. It combines a minimal distortion model that captures pronunciation and glyph based errors with a pure language model probability from an LLM to guide corrections, while introducing a length reward and a faithfulness reward to balance output fluency and fidelity to the input. The approach is evaluated across five public CSC datasets, showing strong cross domain generalization and competitive performance with domain general SOTAs, while outperforming prompt based baselines and even approaching SOTA on several domains. The results demonstrate the practicality and scalability of training free, prompt free CSC with LLMs, and point to the potential for applying similar strategies to other languages and error correction tasks, with considerations for computational cost and knowledge injection through input prefixes.

Abstract

This work proposes a simple training-free prompt-free approach to leverage large language models (LLMs) for the Chinese spelling correction (CSC) task, which is totally different from all previous CSC approaches. The key idea is to use an LLM as a pure language model in a conventional manner. The LLM goes through the input sentence from the beginning, and at each inference step, produces a distribution over its vocabulary for deciding the next token, given a partial sentence. To ensure that the output sentence remains faithful to the input sentence, we design a minimal distortion model that utilizes pronunciation or shape similarities between the original and replaced characters. Furthermore, we propose two useful reward strategies to address practical challenges specific to the CSC task. Experiments on five public datasets demonstrate that our approach significantly improves LLM performance, enabling them to compete with state-of-the-art domain-general CSC models.
Paper Structure (66 sections, 11 equations, 5 figures, 21 tables)

This paper contains 66 sections, 11 equations, 5 figures, 21 tables.

Figures (5)

  • Figure 1: An illustration of our approach. The correct sentence should be "明天就是周末了,又可以跟朋友出去玩了。" (Tomorrow is the weekend, allowing for going out to play with friends again.).
  • Figure 2: A real example of the decoding process for the input sentence "要求师公单位对..." (Requesting the master unit to ...). Here, "施工" (shīgōng, construction) is misspelled as "师公" (shīgōng). Without the length reward, the correct character "施" is fail to be select into the beam.
  • Figure 3: A real example of the probabilities for the next token, given the partial sequence "小明想去" from the sentence "小明想去宿州" (Xiaoming wants to go to Suzhou, Anhui).
  • Figure 4: Prompt templates used in our FSP and ZSP baselines.
  • Figure 5: The scores of Baichuan2 7B with different beam sizes. The solid lines represent the results of our approach, and the dashed lines represent the results of the few-shot baseline. We can observe that larger beam sizes may lead to worse C-F scores in few-shot settings.