Table of Contents
Fetching ...

Ungrammatical-syntax-based In-context Example Selection for Grammatical Error Correction

Chenming Tang, Fanyi Qu, Yunfang Wu

TL;DR

Empirical results show that the proposed ungrammatical-syntax-based in-context example selection strategy for GEC outperform commonly-used word-matching or semantics-based methods with multiple LLMs, indicating that for a syntax-oriented task like GEC, paying more attention to syntactic information can effectively boost LLMs’ performance.

Abstract

In the era of large language models (LLMs), in-context learning (ICL) stands out as an effective prompting strategy that explores LLMs' potency across various tasks. However, applying LLMs to grammatical error correction (GEC) is still a challenging task. In this paper, we propose a novel ungrammatical-syntax-based in-context example selection strategy for GEC. Specifically, we measure similarity of sentences based on their syntactic structures with diverse algorithms, and identify optimal ICL examples sharing the most similar ill-formed syntax to the test input. Additionally, we carry out a two-stage process to further improve the quality of selection results. On benchmark English GEC datasets, empirical results show that our proposed ungrammatical-syntax-based strategies outperform commonly-used word-matching or semantics-based methods with multiple LLMs. This indicates that for a syntax-oriented task like GEC, paying more attention to syntactic information can effectively boost LLMs' performance. Our code will be publicly available after the publication of this paper.

Ungrammatical-syntax-based In-context Example Selection for Grammatical Error Correction

TL;DR

Empirical results show that the proposed ungrammatical-syntax-based in-context example selection strategy for GEC outperform commonly-used word-matching or semantics-based methods with multiple LLMs, indicating that for a syntax-oriented task like GEC, paying more attention to syntactic information can effectively boost LLMs’ performance.

Abstract

In the era of large language models (LLMs), in-context learning (ICL) stands out as an effective prompting strategy that explores LLMs' potency across various tasks. However, applying LLMs to grammatical error correction (GEC) is still a challenging task. In this paper, we propose a novel ungrammatical-syntax-based in-context example selection strategy for GEC. Specifically, we measure similarity of sentences based on their syntactic structures with diverse algorithms, and identify optimal ICL examples sharing the most similar ill-formed syntax to the test input. Additionally, we carry out a two-stage process to further improve the quality of selection results. On benchmark English GEC datasets, empirical results show that our proposed ungrammatical-syntax-based strategies outperform commonly-used word-matching or semantics-based methods with multiple LLMs. This indicates that for a syntax-oriented task like GEC, paying more attention to syntactic information can effectively boost LLMs' performance. Our code will be publicly available after the publication of this paper.
Paper Structure (32 sections, 2 equations, 3 figures, 10 tables, 1 algorithm)

This paper contains 32 sections, 2 equations, 3 figures, 10 tables, 1 algorithm.

Figures (3)

  • Figure 1: Our two-stage selection and ICL workflow. For each input test sample, Stage I computes word similarities with BM25 or BERT representation between the input and all training data and select the top-$1000$ as candidates. Then, Stage II computes ungrammatical syntactic similarities with tree kernel or polynomial distance between the input and candidates to select the most similar $k$ example(s). After that, we concatenate the input after the $k$ examples to construct the prompt for LLM inference. In the end, the LLM outputs the final result.
  • Figure 2: Original illustration of GOPar from zhang-etal-2022-syngec. $\emptyset$ denotes the missing word.
  • Figure 3: An example of parsing tree by GOPar and Stanford Parser.