Subjective Question Generation and Answer Evaluation using NLP
G. M. Refatul Islam, Safwan Shaheer, Yaseen Nur, Mohammad Rafid Hamid
TL;DR
This work addresses the scarce area of subjective question generation and answer evaluation in NLP-enhanced education by leveraging instruct-tuned large language models. It introduces synthetic datasets and a comprehensive training pipeline (LoRA/QLoRA, NF4, bfloat16) to generate subjective questions and rigorously evaluate student answers, using GPT-4 as the evaluation oracle. Key findings show Mistral 7B excels in question generation while GPT-3.5 leads in answer evaluation, underscoring the value of model size, tuning strategy, and task nature. The study highlights the potential to automate higher-order cognitive skill assessment, while acknowledging limitations and outlining concrete avenues for future work to improve robustness and applicability in real classrooms.
Abstract
Natural Language Processing (NLP) is one of the most revolutionary technologies today. It uses artificial intelligence to understand human text and spoken words. It is used for text summarization, grammar checking, sentiment analysis, and advanced chatbots and has many more potential use cases. Furthermore, it has also made its mark on the education sector. Much research and advancements have already been conducted on objective question generation; however, automated subjective question generation and answer evaluation are still in progress. An automated system to generate subjective questions and evaluate the answers can help teachers assess student work and enhance the student's learning experience by allowing them to self-assess their understanding after reading an article or a chapter of a book. This research aims to improve current NLP models or make a novel one for automated subjective question generation and answer evaluation from text input.
