Table of Contents
Fetching ...

Leveraging Prompt-Tuning for Bengali Grammatical Error Explanation Using Large Language Models

Subhankar Maity, Aniket Deroy

TL;DR

This work addresses Bengali Grammatical Error Explanation (BGEE) by introducing a three-step prompt-tuning approach for large language models to identify error types, generate corrected sentences, and provide natural language explanations. The method deploys a modular pipeline (EICM, SCM, EEGM) and is evaluated on a Bengali BGEE dataset augmented with expert explanations, including both automated metrics and human judgments. GPT-4 with prompt-tuning emerges as the strongest performer, achieving notable gains in F1 and exact-match scores and reductions in wrong type and wrong explanation reports, though it still lags behind human experts. The study demonstrates the practical potential of prompt-tuned LLMs for educational feedback in low-resource languages and highlights avenues for reducing remaining gaps to human-level performance.

Abstract

We propose a novel three-step prompt-tuning method for Bengali Grammatical Error Explanation (BGEE) using state-of-the-art large language models (LLMs) such as GPT-4, GPT-3.5 Turbo, and Llama-2-70b. Our approach involves identifying and categorizing grammatical errors in Bengali sentences, generating corrected versions of the sentences, and providing natural language explanations for each identified error. We evaluate the performance of our BGEE system using both automated evaluation metrics and human evaluation conducted by experienced Bengali language experts. Our proposed prompt-tuning approach shows that GPT-4, the best performing LLM, surpasses the baseline model in automated evaluation metrics, with a 5.26% improvement in F1 score and a 6.95% improvement in exact match. Furthermore, compared to the previous baseline, GPT-4 demonstrates a decrease of 25.51% in wrong error type and a decrease of 26.27% in wrong error explanation. However, the results still lag behind the human baseline.

Leveraging Prompt-Tuning for Bengali Grammatical Error Explanation Using Large Language Models

TL;DR

This work addresses Bengali Grammatical Error Explanation (BGEE) by introducing a three-step prompt-tuning approach for large language models to identify error types, generate corrected sentences, and provide natural language explanations. The method deploys a modular pipeline (EICM, SCM, EEGM) and is evaluated on a Bengali BGEE dataset augmented with expert explanations, including both automated metrics and human judgments. GPT-4 with prompt-tuning emerges as the strongest performer, achieving notable gains in F1 and exact-match scores and reductions in wrong type and wrong explanation reports, though it still lags behind human experts. The study demonstrates the practical potential of prompt-tuned LLMs for educational feedback in low-resource languages and highlights avenues for reducing remaining gaps to human-level performance.

Abstract

We propose a novel three-step prompt-tuning method for Bengali Grammatical Error Explanation (BGEE) using state-of-the-art large language models (LLMs) such as GPT-4, GPT-3.5 Turbo, and Llama-2-70b. Our approach involves identifying and categorizing grammatical errors in Bengali sentences, generating corrected versions of the sentences, and providing natural language explanations for each identified error. We evaluate the performance of our BGEE system using both automated evaluation metrics and human evaluation conducted by experienced Bengali language experts. Our proposed prompt-tuning approach shows that GPT-4, the best performing LLM, surpasses the baseline model in automated evaluation metrics, with a 5.26% improvement in F1 score and a 6.95% improvement in exact match. Furthermore, compared to the previous baseline, GPT-4 demonstrates a decrease of 25.51% in wrong error type and a decrease of 26.27% in wrong error explanation. However, the results still lag behind the human baseline.

Paper Structure

This paper contains 7 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overview of the proposed LLM prompt-tuning strategy. LLM denotes Large Language Model, EICM denotes the Error Identification and Categorization Module, SCM denotes the Sentence Correction Module, and EEGM denotes the Error Explanation Generation Module. The prompt fed to the LLM is denoted by “ ”. Definitions of the input notations (e.g., “$P_{\text{types}}$”, $E_{\text{types}}$, etc.) are mentioned in Section \ref{['meth']}.
  • Figure 2: Example of an erroneous Bengali sentence (containing a spelling error) with GPT-4 (w/ PT)'s GEE output and GPT-4 Turbo (w/o PT)'s GEE output (baseline). PT denotes prompt-tuned. “_” in gloss denotes the spelling error in the Bengali word.