MiRAGE: Misconception Detection with Retrieval-Guided Multi-Stage Reasoning and Ensemble Fusion

Cuong Van Duc; Thai Tran Quoc; Minh Nguyen Dinh Tuan; Tam Vu Duc; Son Nguyen Van; Hanh Nguyen Thi

MiRAGE: Misconception Detection with Retrieval-Guided Multi-Stage Reasoning and Ensemble Fusion

Cuong Van Duc, Thai Tran Quoc, Minh Nguyen Dinh Tuan, Tam Vu Duc, Son Nguyen Van, Hanh Nguyen Thi

TL;DR

This work tackles automatic misconception detection in mathematics from open-ended student responses. It introduces MiRAGE, a retrieval-guided, multi-stage reasoning framework that combines retrieval, chain-of-thought reasoning, and cross-attention reranking within an ensemble fusion to deliver accurate, interpretable predictions while reducing reliance on large LLMs. It uses knowledge distillation from a CoT teacher to train a compact Reasoner and a masked supervised contrastive loss in retrieval, achieving MAP@1=$0.82$, MAP@3=$0.92$, and MAP@5=$0.93$ on MAP Student Misconceptions data. The results demonstrate scalable, cost-efficient misconception detection with improved robustness and interpretability, making it suitable for large-scale educational assessment. Potential extensions include applying MiRAGE to science and language domains and incorporating multimodal student work.

Abstract

Detecting student misconceptions in open-ended responses is a longstanding challenge, demanding semantic precision and logical reasoning. We propose MiRAGE - Misconception Detection with Retrieval-Guided Multi-Stage Reasoning and Ensemble Fusion, a novel framework for automated misconception detection in mathematics. MiRAGE operates in three stages: (1) a Retrieval module narrows a large candidate pool to a semantically relevant subset; (2) a Reasoning module employs chain-of-thought generation to expose logical inconsistencies in student solutions; and (3) a Reranking module refines predictions by aligning them with the reasoning. These components are unified through an ensemble-fusion strategy that enhances robustness and interpretability. On mathematics datasets, MiRAGE achieves Mean Average Precision scores of 0.82/0.92/0.93 at levels 1/3/5, consistently outperforming individual modules. By coupling retrieval guidance with multi-stage reasoning, MiRAGE reduces dependence on large-scale language models while delivering a scalable and effective solution for educational assessment.

MiRAGE: Misconception Detection with Retrieval-Guided Multi-Stage Reasoning and Ensemble Fusion

TL;DR

Abstract

MiRAGE: Misconception Detection with Retrieval-Guided Multi-Stage Reasoning and Ensemble Fusion

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)