Ensemble ToT of LLMs and Its Application to Automatic Grading System for Supporting Self-Learning

Yuki Ito; Qiang Ma

Ensemble ToT of LLMs and Its Application to Automatic Grading System for Supporting Self-Learning

Yuki Ito, Qiang Ma

TL;DR

This paper addresses the challenge of providing detailed, timely grading feedback by moving beyond single-LLM grading to an Ensemble ToT framework that coordinates multiple language models. It introduces GET, a grading system that uses pseudo-learning to identify LLM tendencies, generates multiple candidate solutions via Tree-of-Thought, and integrates them through a simulated debate to produce accurate, explainable grading reasons. Empirical results on SAF show GET achieving higher grading-label accuracy and macro F1 on unseen-question/unseen-answer subsets, along with superior automated-quality feedback compared with baselines. The work highlights the practical potential of multi-LLM collaboration for scalable, self-learning support while noting limitations such as dependence on a fixed set of models and the need for user-perception studies.

Abstract

Providing students with detailed and timely grading feedback is essential for self-learning. While existing LLM-based grading systems are promising, most of them rely on one single model, which limits their performance. To address this, we propose Ensemble Tree-of-Thought (ToT), a framework that enhances LLM outputs by integrating multiple models. Using this framework, we develop a grading system. Ensemble ToT follows three steps: (1) analyzing LLM performance, (2) generating candidate answers, and (3) refining them into a final result. Based on this, our grading system first evaluates the grading tendencies of LLMs, then generates multiple results, and finally integrates them via a simulated debate. Experimental results demonstrate our approach's ability to provide accurate and explainable grading by effectively coordinating multiple LLMs.

Ensemble ToT of LLMs and Its Application to Automatic Grading System for Supporting Self-Learning

TL;DR

Abstract

Ensemble ToT of LLMs and Its Application to Automatic Grading System for Supporting Self-Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (25)