Explanation based In-Context Demonstrations Retrieval for Multilingual Grammatical Error Correction
Wei Li, Wen Luo, Guangyue Peng, Houfeng Wang
TL;DR
This work tackles multilingual grammatical error correction (GEC) by addressing the critical challenge of selecting effective in-context demonstrations. It introduces an explanation-based retrieval method that uses Grammatical Error Explanations (GEE) as both query and index to fetch demonstrations, building a multilingual GEE database via LLMs. At inference, a detection prompt generates an initial explanation for the test input, and the top $k_E$ erroneous and $k_C$ correct samples are retrieved to form few-shot demonstrations, which are then used in a standard English prompt to produce the corrected text. Experiments across five languages show consistent gains over semantic and BM25 baselines on both open- and closed-source LLMs, demonstrating that matching error patterns via explanations is a robust, training-free approach that generalizes to diverse datasets like BEA-19.
Abstract
Grammatical error correction (GEC) aims to correct grammatical, spelling, and semantic errors in natural language text. With the growing of large language models (LLMs), direct text generation has gradually become the focus of the GEC methods, and few-shot in-context learning presents a cost-effective solution. However, selecting effective in-context examples remains challenging, as the similarity between input texts does not necessarily correspond to similar grammatical error patterns. In this paper, we propose a novel retrieval method based on natural language grammatical error explanations (GEE) to address this issue. Our method retrieves suitable few-shot demonstrations by matching the GEE of the test input with that of pre-constructed database samples, where explanations for erroneous samples are generated by LLMs. We conducted multilingual GEC few-shot experiments on both major open-source and closed-source LLMs. Experiments across five languages show that our method outperforms existing semantic and BM25-based retrieval techniques, without requiring additional training or language adaptation. This also suggests that matching error patterns is key to selecting examples.
