Table of Contents
Fetching ...

"When Data is Scarce, Prompt Smarter"... Approaches to Grammatical Error Correction in Low-Resource Settings

Somsubhra De, Harsh Kumar, Arun Prakash A

TL;DR

Framing GEC for low-resource Indic languages as a sequence-to-sequence task, the study compares zero-shot and few-shot prompting of state-of-the-art LLMs (GPT-4.1, Gemini-2.5, LLaMA-4) against a LoRA-finetuned Sarvam-M 24B Hindi model on the five-language Indic-GEC dataset from Bhasha. Prompting-based approaches generally outperform the fine-tuned baseline, achieving leading GLEU scores in Tamil and Hindi and competitive results in Malayalam, Bengali, and Telugu, while tokenizer choices significantly impact evaluation reliability. The work highlights the strong multilingual generalization of modern LLMs for GEC and demonstrates that careful prompt design and lightweight adaptation can bridge resource gaps, though tokenization and cross-lingual transfer remain key areas for improvement. It also proposes future directions including cross-lingual transfer, multilingual joint fine-tuning, and tokenizer optimization to further enhance grammatical correction across diverse Indic scripts.

Abstract

Grammatical error correction (GEC) is an important task in Natural Language Processing that aims to automatically detect and correct grammatical mistakes in text. While recent advances in transformer-based models and large annotated datasets have greatly improved GEC performance for high-resource languages such as English, the progress has not extended equally. For most Indic languages, GEC remains a challenging task due to limited resources, linguistic diversity and complex morphology. In this work, we explore prompting-based approaches using state-of-the-art large language models (LLMs), such as GPT-4.1, Gemini-2.5 and LLaMA-4, combined with few-shot strategy to adapt them to low-resource settings. We observe that even basic prompting strategies, such as zero-shot and few-shot approaches, enable these LLMs to substantially outperform fine-tuned Indic-language models like Sarvam-22B, thereby illustrating the exceptional multilingual generalization capabilities of contemporary LLMs for GEC. Our experiments show that carefully designed prompts and lightweight adaptation significantly enhance correction quality across multiple Indic languages. We achieved leading results in the shared task--ranking 1st in Tamil (GLEU: 91.57) and Hindi (GLEU: 85.69), 2nd in Telugu (GLEU: 85.22), 4th in Bangla (GLEU: 92.86), and 5th in Malayalam (GLEU: 92.97). These findings highlight the effectiveness of prompt-driven NLP techniques and underscore the potential of large-scale LLMs to bridge resource gaps in multilingual GEC.

"When Data is Scarce, Prompt Smarter"... Approaches to Grammatical Error Correction in Low-Resource Settings

TL;DR

Framing GEC for low-resource Indic languages as a sequence-to-sequence task, the study compares zero-shot and few-shot prompting of state-of-the-art LLMs (GPT-4.1, Gemini-2.5, LLaMA-4) against a LoRA-finetuned Sarvam-M 24B Hindi model on the five-language Indic-GEC dataset from Bhasha. Prompting-based approaches generally outperform the fine-tuned baseline, achieving leading GLEU scores in Tamil and Hindi and competitive results in Malayalam, Bengali, and Telugu, while tokenizer choices significantly impact evaluation reliability. The work highlights the strong multilingual generalization of modern LLMs for GEC and demonstrates that careful prompt design and lightweight adaptation can bridge resource gaps, though tokenization and cross-lingual transfer remain key areas for improvement. It also proposes future directions including cross-lingual transfer, multilingual joint fine-tuning, and tokenizer optimization to further enhance grammatical correction across diverse Indic scripts.

Abstract

Grammatical error correction (GEC) is an important task in Natural Language Processing that aims to automatically detect and correct grammatical mistakes in text. While recent advances in transformer-based models and large annotated datasets have greatly improved GEC performance for high-resource languages such as English, the progress has not extended equally. For most Indic languages, GEC remains a challenging task due to limited resources, linguistic diversity and complex morphology. In this work, we explore prompting-based approaches using state-of-the-art large language models (LLMs), such as GPT-4.1, Gemini-2.5 and LLaMA-4, combined with few-shot strategy to adapt them to low-resource settings. We observe that even basic prompting strategies, such as zero-shot and few-shot approaches, enable these LLMs to substantially outperform fine-tuned Indic-language models like Sarvam-22B, thereby illustrating the exceptional multilingual generalization capabilities of contemporary LLMs for GEC. Our experiments show that carefully designed prompts and lightweight adaptation significantly enhance correction quality across multiple Indic languages. We achieved leading results in the shared task--ranking 1st in Tamil (GLEU: 91.57) and Hindi (GLEU: 85.69), 2nd in Telugu (GLEU: 85.22), 4th in Bangla (GLEU: 92.86), and 5th in Malayalam (GLEU: 92.97). These findings highlight the effectiveness of prompt-driven NLP techniques and underscore the potential of large-scale LLMs to bridge resource gaps in multilingual GEC.

Paper Structure

This paper contains 15 sections, 6 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Examples from the GEC task dataset. Input sentence ✗ (with errors in red) - ground truth ✓ (with corrections in blue) pairs. Error types have been mentioned, based on our understanding.
  • Figure 2: Comparison of model outputs on a multi-correction BN example (Transl. Knowing only a little about something and assuming that I know everything can be more dangerous than not knowing at all.) from test set. Gemini's output fully aligns with the gold standard, while GPT omits one necessary edit.
  • Figure 3: L: Distribution of test set cases where no corrections are needed. R: How well the models followed the instruction by not editing when no changes were required?
  • Figure 4: Tokenization density across the three architectures
  • Figure 5: Breakdown of tokenization in GPT-4.1-mini vs. Llama-4-Maverick, for the TEL input. (Gemini 2.5 Flash API does not currently expose token-level breakdown information.)