Leveraging Prompt-Tuning for Bengali Grammatical Error Explanation Using Large Language Models
Subhankar Maity, Aniket Deroy
TL;DR
This work addresses Bengali Grammatical Error Explanation (BGEE) by introducing a three-step prompt-tuning approach for large language models to identify error types, generate corrected sentences, and provide natural language explanations. The method deploys a modular pipeline (EICM, SCM, EEGM) and is evaluated on a Bengali BGEE dataset augmented with expert explanations, including both automated metrics and human judgments. GPT-4 with prompt-tuning emerges as the strongest performer, achieving notable gains in F1 and exact-match scores and reductions in wrong type and wrong explanation reports, though it still lags behind human experts. The study demonstrates the practical potential of prompt-tuned LLMs for educational feedback in low-resource languages and highlights avenues for reducing remaining gaps to human-level performance.
Abstract
We propose a novel three-step prompt-tuning method for Bengali Grammatical Error Explanation (BGEE) using state-of-the-art large language models (LLMs) such as GPT-4, GPT-3.5 Turbo, and Llama-2-70b. Our approach involves identifying and categorizing grammatical errors in Bengali sentences, generating corrected versions of the sentences, and providing natural language explanations for each identified error. We evaluate the performance of our BGEE system using both automated evaluation metrics and human evaluation conducted by experienced Bengali language experts. Our proposed prompt-tuning approach shows that GPT-4, the best performing LLM, surpasses the baseline model in automated evaluation metrics, with a 5.26% improvement in F1 score and a 6.95% improvement in exact match. Furthermore, compared to the previous baseline, GPT-4 demonstrates a decrease of 25.51% in wrong error type and a decrease of 26.27% in wrong error explanation. However, the results still lag behind the human baseline.
