Table of Contents
Fetching ...

Comparing Reinforcement Learning and Human Learning using the Game of Hidden Rules

Eric Pulick, Vladimir Menkov, Yonatan Mintz, Paul Kantor, Vicki Bier

TL;DR

The paper addresses how task structure influences learning by introducing the Game of Hidden Rules (GOHR), a controllable environment that encodes hidden rules as rule lines and atoms on a $6\times 6$ board with $144$ actions and $4$ buckets. It compares human learners to two RL agents (DQN and REINFORCE) across stationary vs non-stationary rules and varying rule generality, using memory-aware featurizations. Key contributions include a flexible rule-syntax enabling precise structure manipulation, systematic HL/RL comparisons across structured tasks, and empirical findings that humans and RL respond differently to task structure, with RL more easily adapting to generality while humans show selective difficulties. This work advances task-oriented understanding of RL and HL and provides a shareable platform for curricula, transfer learning, and human–machine teaming studies.

Abstract

Reliable real-world deployment of reinforcement learning (RL) methods requires a nuanced understanding of their strengths and weaknesses and how they compare to those of humans. Human-machine systems are becoming more prevalent and the design of these systems relies on a task-oriented understanding of both human learning (HL) and RL. Thus, an important line of research is characterizing how the structure of a learning task affects learning performance. While increasingly complex benchmark environments have led to improved RL capabilities, such environments are difficult to use for the dedicated study of task structure. To address this challenge we present a learning environment built to support rigorous study of the impact of task structure on HL and RL. We demonstrate the environment's utility for such study through example experiments in task structure that show performance differences between humans and RL algorithms.

Comparing Reinforcement Learning and Human Learning using the Game of Hidden Rules

TL;DR

The paper addresses how task structure influences learning by introducing the Game of Hidden Rules (GOHR), a controllable environment that encodes hidden rules as rule lines and atoms on a board with actions and buckets. It compares human learners to two RL agents (DQN and REINFORCE) across stationary vs non-stationary rules and varying rule generality, using memory-aware featurizations. Key contributions include a flexible rule-syntax enabling precise structure manipulation, systematic HL/RL comparisons across structured tasks, and empirical findings that humans and RL respond differently to task structure, with RL more easily adapting to generality while humans show selective difficulties. This work advances task-oriented understanding of RL and HL and provides a shareable platform for curricula, transfer learning, and human–machine teaming studies.

Abstract

Reliable real-world deployment of reinforcement learning (RL) methods requires a nuanced understanding of their strengths and weaknesses and how they compare to those of humans. Human-machine systems are becoming more prevalent and the design of these systems relies on a task-oriented understanding of both human learning (HL) and RL. Thus, an important line of research is characterizing how the structure of a learning task affects learning performance. While increasingly complex benchmark environments have led to improved RL capabilities, such environments are difficult to use for the dedicated study of task structure. To address this challenge we present a learning environment built to support rigorous study of the impact of task structure on HL and RL. We demonstrate the environment's utility for such study through example experiments in task structure that show performance differences between humans and RL algorithms.
Paper Structure (35 sections, 12 figures, 4 tables, 2 algorithms)

This paper contains 35 sections, 12 figures, 4 tables, 2 algorithms.

Figures (12)

  • Figure 1: Game board diagram (left) and a sample board with four shapes and colors (right).
  • Figure 2: Sample learning runs for a human (left) and DQN (right), plotting cumulative error count against the move and episode indices, respectively. The human's learning run is summarized by an $m^*$ of 34, the first move of the first streak of 10+ correct moves. The DQN's learning run is summarized by a TCE of 305, the error count after 4000 episodes.
  • Figure 3: Base rule performance of humans (left) and RL players (right). ECDF curves denote the fraction of human players achieving an $m^*$ streak by a given move index on each rule ('Never' indicates player does not achieve such a streak). Strip plots of TCE distributions of each rule are provided for DQN and REINFORCE, separated due to different TCE magnitudes ('C.N.M.' indicates convergence criteria were not met for that learning run). Each dot corresponds to a learning run.
  • Figure 4: ECDFs of $m^*$ distributions for each rule family. Stationary rules (shape/quadrant) shown on left. Non-stationary rules (clockwise/alternating) shown on right. Base rules shown in blue.
  • Figure 5: Empirical cumulative distribution plots denoting the fraction of players who achieved an $m^*$ streak by a given move index for all base rules (including CM).
  • ...and 7 more figures