Table of Contents
Fetching ...

MiRAGE: Misconception Detection with Retrieval-Guided Multi-Stage Reasoning and Ensemble Fusion

Cuong Van Duc, Thai Tran Quoc, Minh Nguyen Dinh Tuan, Tam Vu Duc, Son Nguyen Van, Hanh Nguyen Thi

TL;DR

This work tackles automatic misconception detection in mathematics from open-ended student responses. It introduces MiRAGE, a retrieval-guided, multi-stage reasoning framework that combines retrieval, chain-of-thought reasoning, and cross-attention reranking within an ensemble fusion to deliver accurate, interpretable predictions while reducing reliance on large LLMs. It uses knowledge distillation from a CoT teacher to train a compact Reasoner and a masked supervised contrastive loss in retrieval, achieving MAP@1=$0.82$, MAP@3=$0.92$, and MAP@5=$0.93$ on MAP Student Misconceptions data. The results demonstrate scalable, cost-efficient misconception detection with improved robustness and interpretability, making it suitable for large-scale educational assessment. Potential extensions include applying MiRAGE to science and language domains and incorporating multimodal student work.

Abstract

Detecting student misconceptions in open-ended responses is a longstanding challenge, demanding semantic precision and logical reasoning. We propose MiRAGE - Misconception Detection with Retrieval-Guided Multi-Stage Reasoning and Ensemble Fusion, a novel framework for automated misconception detection in mathematics. MiRAGE operates in three stages: (1) a Retrieval module narrows a large candidate pool to a semantically relevant subset; (2) a Reasoning module employs chain-of-thought generation to expose logical inconsistencies in student solutions; and (3) a Reranking module refines predictions by aligning them with the reasoning. These components are unified through an ensemble-fusion strategy that enhances robustness and interpretability. On mathematics datasets, MiRAGE achieves Mean Average Precision scores of 0.82/0.92/0.93 at levels 1/3/5, consistently outperforming individual modules. By coupling retrieval guidance with multi-stage reasoning, MiRAGE reduces dependence on large-scale language models while delivering a scalable and effective solution for educational assessment.

MiRAGE: Misconception Detection with Retrieval-Guided Multi-Stage Reasoning and Ensemble Fusion

TL;DR

This work tackles automatic misconception detection in mathematics from open-ended student responses. It introduces MiRAGE, a retrieval-guided, multi-stage reasoning framework that combines retrieval, chain-of-thought reasoning, and cross-attention reranking within an ensemble fusion to deliver accurate, interpretable predictions while reducing reliance on large LLMs. It uses knowledge distillation from a CoT teacher to train a compact Reasoner and a masked supervised contrastive loss in retrieval, achieving MAP@1=, MAP@3=, and MAP@5= on MAP Student Misconceptions data. The results demonstrate scalable, cost-efficient misconception detection with improved robustness and interpretability, making it suitable for large-scale educational assessment. Potential extensions include applying MiRAGE to science and language domains and incorporating multimodal student work.

Abstract

Detecting student misconceptions in open-ended responses is a longstanding challenge, demanding semantic precision and logical reasoning. We propose MiRAGE - Misconception Detection with Retrieval-Guided Multi-Stage Reasoning and Ensemble Fusion, a novel framework for automated misconception detection in mathematics. MiRAGE operates in three stages: (1) a Retrieval module narrows a large candidate pool to a semantically relevant subset; (2) a Reasoning module employs chain-of-thought generation to expose logical inconsistencies in student solutions; and (3) a Reranking module refines predictions by aligning them with the reasoning. These components are unified through an ensemble-fusion strategy that enhances robustness and interpretability. On mathematics datasets, MiRAGE achieves Mean Average Precision scores of 0.82/0.92/0.93 at levels 1/3/5, consistently outperforming individual modules. By coupling retrieval guidance with multi-stage reasoning, MiRAGE reduces dependence on large-scale language models while delivering a scalable and effective solution for educational assessment.

Paper Structure

This paper contains 16 sections, 18 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: The MiRAGE pipeline framework. The query is first embedded by the Retrieval module to select top-$k$ candidate labels. In parallel, the Reasoning module generates explanations. Both are then passed to the Reranking module, which realigns scores with the reasoning. Finally, retrieval and reranking scores are fused through an ensemble strategy to produce the final ranking.
  • Figure 2: Prompt for re-ranker module