Zero-Shot Grammar Competency Estimation Using Large Language Model Generated Pseudo Labels
Sourya Dipta Das, Shubham Kumar, Kuldeep Yadav
TL;DR
This work addresses the scarcity of labeled data for grammar competency scoring, especially in spoken language, by introducing a zero-shot framework that uses rubric-guided LLM-generated pseudo-labels to train a transformer regression model. A novel noise-robust training scheme with adaptive sample weighting mitigates label noise and enables learning from unlabeled data across written and spoken modalities. The approach is validated on two in-house datasets, SGAD (spoken) and WGAD (written), and shows that pseudo-label quality—driven by the choice of LLM—and the alpha parameter for clean-sample retention critically affect performance. Results indicate substantial improvements over baselines and demonstrate robustness to label noise and cross-modal Generalization, with practical implications for scalable, low-resource grammar assessment in education and related domains.
Abstract
Grammar competency estimation is essential for assessing linguistic proficiency in both written and spoken language; however, the spoken modality presents additional challenges due to its spontaneous, unstructured, and disfluent nature. Developing accurate grammar scoring models further requires extensive expert annotation, making large-scale data creation impractical. To address these limitations, we propose a zero-shot grammar competency estimation framework that leverages unlabeled data and Large Language Models (LLMs) without relying on manual labels. During training, we employ LLM-generated predictions on unlabeled data by using grammar competency rubric-based prompts. These predictions, treated as pseudo labels, are utilized to train a transformer-based model through a novel training framework designed to handle label noise effectively. We show that the choice of LLM for pseudo-label generation critically affects model performance and that the ratio of clean-to-noisy samples during training strongly influences stability and accuracy. Finally, a qualitative analysis of error intensity and score prediction confirms the robustness and interpretability of our approach. Experimental results demonstrate the efficacy of our approach in estimating grammar competency scores with high accuracy, paving the way for scalable, low-resource grammar assessment systems.
