Rationale Behind Essay Scores: Enhancing S-LLM's Multi-Trait Essay Scoring with Rationale Generated by LLMs

SeongYeub Chu; JongWoo Kim; Bryan Wong; MunYong Yi

Rationale Behind Essay Scores: Enhancing S-LLM's Multi-Trait Essay Scoring with Rationale Generated by LLMs

SeongYeub Chu, JongWoo Kim, Bryan Wong, MunYong Yi

TL;DR

This work introduces RMTS, a two-stage framework that leverages prompt-engineered LLMs to generate trait-specific, rubric-aligned rationales and a shared encoder-decoder S-LLM to produce trait scores. By embedding rationale content alongside essays, RMTS improves trait-wise reliability and interpretability in automated essay scoring, achieving consistent gains on ASAP/ASAP++ and the Feedback Prize dataset over strong baselines. The approach highlights the value of rationale-informed decoding for rubric-aligned evaluation and provides evidence of faithfulness and similarity properties of generated rationales. Practical impact includes improved multi-trait scoring accuracy and transparency, with code available for reproducibility.

Abstract

Existing automated essay scoring (AES) has solely relied on essay text without using explanatory rationales for the scores, thereby forgoing an opportunity to capture the specific aspects evaluated by rubric indicators in a fine-grained manner. This paper introduces Rationale-based Multiple Trait Scoring (RMTS), a novel approach for multi-trait essay scoring that integrates prompt-engineering-based large language models (LLMs) with a fine-tuning-based essay scoring model using a smaller large language model (S-LLM). RMTS uses an LLM-based trait-wise rationale generation system where a separate LLM agent generates trait-specific rationales based on rubric guidelines, which the scoring model uses to accurately predict multi-trait scores. Extensive experiments on benchmark datasets, including ASAP, ASAP++, and Feedback Prize, show that RMTS significantly outperforms state-of-the-art models and vanilla S-LLMs in trait-specific scoring. By assisting quantitative assessment with fine-grained qualitative rationales, RMTS enhances the trait-wise reliability, providing partial explanations about essays. The code is available at https://github.com/BBeeChu/RMTS.git.

Rationale Behind Essay Scores: Enhancing S-LLM's Multi-Trait Essay Scoring with Rationale Generated by LLMs

TL;DR

Abstract

Rationale Behind Essay Scores: Enhancing S-LLM's Multi-Trait Essay Scoring with Rationale Generated by LLMs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)