Table of Contents
Fetching ...

Explanation based In-Context Demonstrations Retrieval for Multilingual Grammatical Error Correction

Wei Li, Wen Luo, Guangyue Peng, Houfeng Wang

TL;DR

This work tackles multilingual grammatical error correction (GEC) by addressing the critical challenge of selecting effective in-context demonstrations. It introduces an explanation-based retrieval method that uses Grammatical Error Explanations (GEE) as both query and index to fetch demonstrations, building a multilingual GEE database via LLMs. At inference, a detection prompt generates an initial explanation for the test input, and the top $k_E$ erroneous and $k_C$ correct samples are retrieved to form few-shot demonstrations, which are then used in a standard English prompt to produce the corrected text. Experiments across five languages show consistent gains over semantic and BM25 baselines on both open- and closed-source LLMs, demonstrating that matching error patterns via explanations is a robust, training-free approach that generalizes to diverse datasets like BEA-19.

Abstract

Grammatical error correction (GEC) aims to correct grammatical, spelling, and semantic errors in natural language text. With the growing of large language models (LLMs), direct text generation has gradually become the focus of the GEC methods, and few-shot in-context learning presents a cost-effective solution. However, selecting effective in-context examples remains challenging, as the similarity between input texts does not necessarily correspond to similar grammatical error patterns. In this paper, we propose a novel retrieval method based on natural language grammatical error explanations (GEE) to address this issue. Our method retrieves suitable few-shot demonstrations by matching the GEE of the test input with that of pre-constructed database samples, where explanations for erroneous samples are generated by LLMs. We conducted multilingual GEC few-shot experiments on both major open-source and closed-source LLMs. Experiments across five languages show that our method outperforms existing semantic and BM25-based retrieval techniques, without requiring additional training or language adaptation. This also suggests that matching error patterns is key to selecting examples.

Explanation based In-Context Demonstrations Retrieval for Multilingual Grammatical Error Correction

TL;DR

This work tackles multilingual grammatical error correction (GEC) by addressing the critical challenge of selecting effective in-context demonstrations. It introduces an explanation-based retrieval method that uses Grammatical Error Explanations (GEE) as both query and index to fetch demonstrations, building a multilingual GEE database via LLMs. At inference, a detection prompt generates an initial explanation for the test input, and the top erroneous and correct samples are retrieved to form few-shot demonstrations, which are then used in a standard English prompt to produce the corrected text. Experiments across five languages show consistent gains over semantic and BM25 baselines on both open- and closed-source LLMs, demonstrating that matching error patterns via explanations is a robust, training-free approach that generalizes to diverse datasets like BEA-19.

Abstract

Grammatical error correction (GEC) aims to correct grammatical, spelling, and semantic errors in natural language text. With the growing of large language models (LLMs), direct text generation has gradually become the focus of the GEC methods, and few-shot in-context learning presents a cost-effective solution. However, selecting effective in-context examples remains challenging, as the similarity between input texts does not necessarily correspond to similar grammatical error patterns. In this paper, we propose a novel retrieval method based on natural language grammatical error explanations (GEE) to address this issue. Our method retrieves suitable few-shot demonstrations by matching the GEE of the test input with that of pre-constructed database samples, where explanations for erroneous samples are generated by LLMs. We conducted multilingual GEC few-shot experiments on both major open-source and closed-source LLMs. Experiments across five languages show that our method outperforms existing semantic and BM25-based retrieval techniques, without requiring additional training or language adaptation. This also suggests that matching error patterns is key to selecting examples.

Paper Structure

This paper contains 24 sections, 7 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: The comparison between input-based demonstrations and explanation-based demonstrations for GEC. Samples with similar inputs do not necessarily contain the same grammatical error patterns. However, through preliminary checks and initial explanations, it is possible to retrieve samples with similar errors from a database indexed by grammatical error explanations, even if the semantics of the demonstrations differs significantly from the test input.
  • Figure 2: The proposed pipeline for few-shot GEC using the explanation-based demonstration retrieval method. As shown on the left side, we construct sample databases that include explanations. As illustrated on the right side, in the prediction stage, the erroneous samples for in-context demonstrations are retrieved using explanations.
  • Figure 3: GEC metrics on 4 datasets as the number of the correct samples $k_C$ varies, with a total of $k_E + k_C = 8$ in-context demonstrations consisting of both erroneous and correct samples.
  • Figure 4: An example of the retrieval and generation result from the Russian GEC.
  • Figure 5: An example of the retrieval and generation result from the English GEC.