Ungrammatical-syntax-based In-context Example Selection for Grammatical Error Correction

Chenming Tang; Fanyi Qu; Yunfang Wu

Ungrammatical-syntax-based In-context Example Selection for Grammatical Error Correction

Chenming Tang, Fanyi Qu, Yunfang Wu

TL;DR

Empirical results show that the proposed ungrammatical-syntax-based in-context example selection strategy for GEC outperform commonly-used word-matching or semantics-based methods with multiple LLMs, indicating that for a syntax-oriented task like GEC, paying more attention to syntactic information can effectively boost LLMs’ performance.

Abstract

In the era of large language models (LLMs), in-context learning (ICL) stands out as an effective prompting strategy that explores LLMs' potency across various tasks. However, applying LLMs to grammatical error correction (GEC) is still a challenging task. In this paper, we propose a novel ungrammatical-syntax-based in-context example selection strategy for GEC. Specifically, we measure similarity of sentences based on their syntactic structures with diverse algorithms, and identify optimal ICL examples sharing the most similar ill-formed syntax to the test input. Additionally, we carry out a two-stage process to further improve the quality of selection results. On benchmark English GEC datasets, empirical results show that our proposed ungrammatical-syntax-based strategies outperform commonly-used word-matching or semantics-based methods with multiple LLMs. This indicates that for a syntax-oriented task like GEC, paying more attention to syntactic information can effectively boost LLMs' performance. Our code will be publicly available after the publication of this paper.

Ungrammatical-syntax-based In-context Example Selection for Grammatical Error Correction

TL;DR

Abstract

Paper Structure (32 sections, 2 equations, 3 figures, 10 tables, 1 algorithm)

This paper contains 32 sections, 2 equations, 3 figures, 10 tables, 1 algorithm.

Introduction
Related Work
Grammatical Error Correction
Syntactic Similarity
Large Language Models and In-context Learning
Preliminaries
Syntax Parser for Ungrammatical Sentences
Syntactic Similarity with Tree Kernel
Syntactic Similarity with Polynomial Distance
Methodology
In-context Learning Workflow for GEC
Ungrammatical-syntax-based Selection
Weighting Ungrammatical Nodes with Polynomial Distance
Two-stage Selection
Stage 1: BM25/BERT Selection
...and 17 more sections

Figures (3)

Figure 1: Our two-stage selection and ICL workflow. For each input test sample, Stage I computes word similarities with BM25 or BERT representation between the input and all training data and select the top-$1000$ as candidates. Then, Stage II computes ungrammatical syntactic similarities with tree kernel or polynomial distance between the input and candidates to select the most similar $k$ example(s). After that, we concatenate the input after the $k$ examples to construct the prompt for LLM inference. In the end, the LLM outputs the final result.
Figure 2: Original illustration of GOPar from zhang-etal-2022-syngec. $\emptyset$ denotes the missing word.
Figure 3: An example of parsing tree by GOPar and Stanford Parser.

Ungrammatical-syntax-based In-context Example Selection for Grammatical Error Correction

TL;DR

Abstract

Ungrammatical-syntax-based In-context Example Selection for Grammatical Error Correction

Authors

TL;DR

Abstract

Table of Contents

Figures (3)