Beyond Scores: A Modular RAG-Based System for Automatic Short Answer Scoring with Feedback

Menna Fateen; Bo Wang; Tsunenori Mine

Beyond Scores: A Modular RAG-Based System for Automatic Short Answer Scoring with Feedback

Menna Fateen, Bo Wang, Tsunenori Mine

TL;DR

This paper proposes a modular retrieval augmented generation (RAG) based ASAS-F system, utilizing RAG as a few-shot selection method to score answers and generate feedback in zero-shot and few-shot learning scenarios, and designs the system to be adaptable without extensive prompt engineering using an automatic prompt generation framework.

Abstract

Automatic short answer scoring (ASAS) helps reduce the grading burden on educators but often lacks detailed, explainable feedback. Existing methods in ASAS with feedback (ASAS-F) rely on fine-tuning language models with limited datasets, which is resource-intensive and struggles to generalize across contexts. Recent approaches using large language models (LLMs) have focused on scoring without extensive fine-tuning. However, they often rely heavily on prompt engineering and either fail to generate elaborated feedback or do not adequately evaluate it. In this paper, we propose a modular retrieval augmented generation based ASAS-F system that scores answers and generates feedback in strict zero-shot and few-shot learning scenarios. We design our system to be adaptable to various educational tasks without extensive prompt engineering using an automatic prompt generation framework. Results show an improvement in scoring accuracy by 9\% on unseen questions compared to fine-tuning, offering a scalable and cost-effective solution.

Beyond Scores: A Modular RAG-Based System for Automatic Short Answer Scoring with Feedback

TL;DR

Abstract

Paper Structure (32 sections, 2 equations, 10 figures, 10 tables)

This paper contains 32 sections, 2 equations, 10 figures, 10 tables.

Introduction
Related Work
Automatic Short Answer Scoring
ASAS Using Generative Models
ASAS with Feedback
Methodology
Problem Formulation
ASAS-F-Z: Zero-Shot ASAS-F
ASAS-F-Opt: Automatic Few-Shot Optimization with DSPy
ASAS-F-RAG: Few-Shot ASAS-F Using RAG
Similarity-Based Majority-Vote with ColBERT
ASAS-F-RAG
Experimental Setup
Dataset
Evaluation Metrics
...and 17 more sections

Figures (10)

Figure 1: Overview of the implementation of the modular ASAS-F-Z and ASAS-F-RAG systems using DSPy
Figure 2: Overview of the ASAS-F system using LLMs and ColBERT-driven RAG.
Figure 3: Example of feedback generated by the ASAS-F system compared to the reference feedback. Traditional metrics may not capture the nuances of feedback quality.
Figure 4: Performance of the ASAS-F-Z system on the SAF dataset. Higher is better for accuracy and F1 score, lower is better for RMSE.
Figure 5: Performance of the ASAS-F-RAG system on the SAF dataset. Higher is better for accuracy and F1 score, lower is better for RMSE.
...and 5 more figures

Beyond Scores: A Modular RAG-Based System for Automatic Short Answer Scoring with Feedback

TL;DR

Abstract

Beyond Scores: A Modular RAG-Based System for Automatic Short Answer Scoring with Feedback

Authors

TL;DR

Abstract

Table of Contents

Figures (10)