Adaptation of the Multi-Concept Multivariate Elo Rating System to Medical Students Training Data
Erva Nihan Kandemir, Jill-Jenn Vie, Adam Sanchez-Ayte, Olivier Palombi, Franck Ramus
TL;DR
This work evaluates a multi-concept Elo rating framework on the BNE medical training platform to predict student performance and question difficulty in a large, sparse, multi-specialty setting. It introduces guessing behavior, dynamic uncertainty, and multi-knowledge-component extensions to Elo, and demonstrates that Elo achieves comparable predictive accuracy to logistic regression on mock exams while enabling real-time, interpretable knowledge tracing. Initializing Elo with prior-year logistic regression data accelerates early convergence and improves early accuracy, highlighting a practical path for online adaptive learning. The study also discusses data characteristics, limitations, and directions for future work, including forgetting curves and online recommendation strategies.
Abstract
Accurate estimation of question difficulty and prediction of student performance play key roles in optimizing educational instruction and enhancing learning outcomes within digital learning platforms. The Elo rating system is widely recognized for its proficiency in predicting student performance by estimating both question difficulty and student ability while providing computational efficiency and real-time adaptivity. This paper presents an adaptation of a multi concept variant of the Elo rating system to the data collected by a medical training platform, a platform characterized by a vast knowledge corpus, substantial inter-concept overlap, a huge question bank with significant sparsity in user question interactions, and a highly diverse user population, presenting unique challenges. Our study is driven by two primary objectives: firstly, to comprehensively evaluate the Elo rating systems capabilities on this real-life data, and secondly, to tackle the issue of imprecise early stage estimations when implementing the Elo rating system for online assessments. Our findings suggest that the Elo rating system exhibits comparable accuracy to the well-established logistic regression model in predicting final exam outcomes for users within our digital platform. Furthermore, results underscore that initializing Elo rating estimates with historical data remarkably reduces errors and enhances prediction accuracy, especially during the initial phases of student interactions.
