EduQate: Generating Adaptive Curricula through RMABs in Education Settings

Sidney Tio; Dexun Li; Pradeep Varakantham

EduQate: Generating Adaptive Curricula through RMABs in Education Settings

Sidney Tio, Dexun Li, Pradeep Varakantham

TL;DR

E EduQate, a method employing interdependency-aware Q-learning to make informed decisions on arm selection at each time step is proposed, establishing the optimality guarantee of EduQate and demonstrating its efficacy compared to baseline policies.

Abstract

There has been significant interest in the development of personalized and adaptive educational tools that cater to a student's individual learning progress. A crucial aspect in developing such tools is in exploring how mastery can be achieved across a diverse yet related range of content in an efficient manner. While Reinforcement Learning and Multi-armed Bandits have shown promise in educational settings, existing works often assume the independence of learning content, neglecting the prevalent interdependencies between such content. In response, we introduce Education Network Restless Multi-armed Bandits (EdNetRMABs), utilizing a network to represent the relationships between interdependent arms. Subsequently, we propose EduQate, a method employing interdependency-aware Q-learning to make informed decisions on arm selection at each time step. We establish the optimality guarantee of EduQate and demonstrate its efficacy compared to baseline policies, using students modeled from both synthetic and real-world data.

EduQate: Generating Adaptive Curricula through RMABs in Education Settings

TL;DR

Abstract

Paper Structure (32 sections, 3 theorems, 8 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 32 sections, 3 theorems, 8 equations, 5 figures, 5 tables, 1 algorithm.

Introduction
Related Work and Preliminaries
Restless Multi-Armed Bandits
Reinforcement Learning in Education
Model
EdNetRMABs
Arms
State space
Action space
Transition function
EduQate
Analysis of EduQate
Experiment
Experiment setup
Creating student models
...and 17 more sections

Key Result

Theorem 1

Choosing the top arm with the largest $\lambda$ value in Equation eq:new_lambda is equivalent to maximizing the cumulative long-term reward.

Figures (5)

Figure 1: Average rewards for the respective algorithms on 3 datasets, averaged across 30 runs. Shaded regions represent standard error.
Figure 2: This visualization compares network complexities from our experiments. The synthetic dataset (left) shows simpler, isolated groups, while the real-world datasets (Junyi, middle; OLI,right) displays more intricate and interconnected relationships amongst items.
Figure 3: Average rewards for the respective algorithms, on the last episode of training. Note that as $N_{topics}$ increase, the network effects are reduced, and most algorithms are not better than a random policy.
Figure 4: Synthetic network when $N_{topics} = 40$. Note that some arms are without group members, and do not receive benefits from networks. Node colors represent topic groups.
Figure 5: Average rewards across 800 episodes of training, across 30 seeds. EduQate- (orange) refers to the EduQate algorithm without replay buffer.

Theorems & Definitions (3)

Theorem 1
Theorem 2
Theorem 3

EduQate: Generating Adaptive Curricula through RMABs in Education Settings

TL;DR

Abstract

EduQate: Generating Adaptive Curricula through RMABs in Education Settings

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (3)