EduGym: An Environment and Notebook Suite for Reinforcement Learning Education

Thomas M. Moerland; Matthias Müller-Brockhausen; Zhao Yang; Andrius Bernatavicius; Koen Ponse; Tom Kouwenhoven; Andreas Sauter; Michiel van der Meer; Bram Renting; Aske Plaat

EduGym: An Environment and Notebook Suite for Reinforcement Learning Education

Thomas M. Moerland, Matthias Müller-Brockhausen, Zhao Yang, Andrius Bernatavicius, Koen Ponse, Tom Kouwenhoven, Andreas Sauter, Michiel van der Meer, Bram Renting, Aske Plaat

TL;DR

EduGym addresses a key gap in reinforcement learning education by providing a library of low-dimensional, pedagogically focused environments that isolate specific RL challenges, paired with interactive notebooks that connect theory and code. The nine challenges (e.g., exploration, partial observability, stochasticity, model-based planning) allow students to experiment with tunable difficulty and observe how different algorithms respond in controlled settings. Empirical student evaluation shows high perceived utility for both conceptual and practical understanding, supporting EduGym’s potential as a scalable teaching tool. The work highlights a shift toward interpretable, experiment-friendly educational resources that complement existing textbooks and public codebases, with open-source materials and online notebooks available for broader adoption.

Abstract

Due to the empirical success of reinforcement learning, an increasing number of students study the subject. However, from our practical teaching experience, we see students entering the field (bachelor, master and early PhD) often struggle. On the one hand, textbooks and (online) lectures provide the fundamentals, but students find it hard to translate between equations and code. On the other hand, public codebases do provide practical examples, but the implemented algorithms tend to be complex, and the underlying test environments contain multiple reinforcement learning challenges at once. Although this is realistic from a research perspective, it often hinders educational conceptual understanding. To solve this issue we introduce EduGym, a set of educational reinforcement learning environments and associated interactive notebooks tailored for education. Each EduGym environment is specifically designed to illustrate a certain aspect/challenge of reinforcement learning (e.g., exploration, partial observability, stochasticity, etc.), while the associated interactive notebook explains the challenge and its possible solution approaches, connecting equations and code in a single document. An evaluation among RL students and researchers shows 86% of them think EduGym is a useful tool for reinforcement learning education. All notebooks are available from https://www.edugym.org/, while the full software package can be installed from https://github.com/RLG-Leiden/edugym.

EduGym: An Environment and Notebook Suite for Reinforcement Learning Education

TL;DR

Abstract

Paper Structure (34 sections, 1 equation, 12 figures, 2 tables)

This paper contains 34 sections, 1 equation, 12 figures, 2 tables.

Introduction
Related Work
Environment suites
Toy environments
Algorithmic frameworks
Benchmarking
Background
Challenges and environments
a. Exploration: Boulder
b. On/off-policy: Roadrunner
c. Credit assignment depth: Study
d. State dimensionality: Catch
e. Partial observability: MemoryCorridor
f. Amount of state signal: Tamagotchi
g. Discrete/continuous tasks: Trashbot
...and 19 more sections

Figures (12)

Figure 1: Edugym environments. Top row: Boulder (illustrates exploration), Roadrunner (illustrates on/off-policy), Study (illustrates credit assignment depth). Middle row: Catch (illustrates dimensionality), MemoryCorridor (illustrates partial observability), Tamagotchi (illustrates state signal). Bottom row: Trashbot (illustrates continuous states/actions), Golf (illustrates stochasticity), Supermarket (illustrates model-based reinforcement learning). All environments are introduced in \ref{['sec:environments']}, while full specifications are available in \ref{['app:environments']}.
Figure 2: Comparison of different exploration methods on the Boulder environment. Subfigures progress in magnitude of the exploration challenge (height of the Boulder). Left: Both intrinsic motivation methods quickly solve the low Boulder of height 10, with $\epsilon$-greedy also catching up eventually. Middle: For the middle height Boulder of size 30, reward-based intrinsic motivation starts to suffer, while $\epsilon$-greedy cannot solve it at all within 20k steps. Right: For the high Boulder of size 100, only goal-based intrinsic motivation manages to solve it. All agents use Q-learning, results averaged over 10 repetitions.
Figure 3: Left: Illustration of framestacking/ windowing to overcome partial observability on the MemoryCorridor. Agents that include more historical information in their state manage to advance further in the corridor. All agents use Q-learning, results averaged over 10 repetitions. Right: Illustration of two model-based RL algorithms (Dyna and Prioritised Sweeping) versus model-free RL method (Q-learning) on the Supermarket. Both model-based RL methods learn faster since they use a learned model to more effectively propagate information through the value function solution. All agents use $\epsilon$-greedy exploration, results averaged over 10 repetitions.
Figure :
Figure :
...and 7 more figures

EduGym: An Environment and Notebook Suite for Reinforcement Learning Education

TL;DR

Abstract

EduGym: An Environment and Notebook Suite for Reinforcement Learning Education

Authors

TL;DR

Abstract

Table of Contents

Figures (12)