A Bandit-Based Approach to Educational Recommender Systems: Contextual Thompson Sampling for Learner Skill Gain Optimization
Lukas De Kerpel, Arthur Thuy, Dries F. Benoit
TL;DR
This work addresses scalable personalization of practice in digital OR/MS/Analytics education by recasting exercise sequencing as a contextual bandit problem that optimizes for skill gain. It introduces Linear Thompson Sampling (LinTS) and compares it to non-contextual Thompson Sampling and collaborative filtering baselines, with rewards defined as the change in skill mastery estimated via Bayesian Knowledge Tracing. Using the ASSISTments 2017 dataset, LinTS outperforms all baselines, achieving a final average skill-gain reward of $0.198$ and demonstrating favorable exploration–exploitation dynamics that concentrate recommendations on high-value exercises. The approach offers adaptive, data-driven guidance for instructors, highlighting effective exercises and enabling personalized remediation at scale, with potential extensions to richer context and multi-objective educational goals.
Abstract
In recent years, instructional practices in Operations Research (OR), Management Science (MS), and Analytics have increasingly shifted toward digital environments, where large and diverse groups of learners make it difficult to provide practice that adapts to individual needs. This paper introduces a method that generates personalized sequences of exercises by selecting, at each step, the exercise most likely to advance a learner's understanding of a targeted skill. The method uses information about the learner and their past performance to guide these choices, and learning progress is measured as the change in estimated skill level before and after each exercise. Using data from an online mathematics tutoring platform, we find that the approach recommends exercises associated with greater skill improvement and adapts effectively to differences across learners. From an instructional perspective, the framework enables personalized practice at scale, highlights exercises with consistently strong learning value, and helps instructors identify learners who may benefit from additional support.
