Table of Contents
Fetching ...

Intrinsically-Motivated Humans and Agents in Open-World Exploration

Aly Lidayan, Yuqing Du, Eliza Kosoy, Maria Rufova, Pieter Abbeel, Alison Gopnik

TL;DR

This study investigates intrinsic motivation in open-ended exploration by directly comparing adults, children, and RL agents in Crafter, a Minecraft-like environment, across entropy, information gain, and empowerment objectives. It finds that Entropy and Empowerment reliably correlate with human exploration progress, while Information Gain shows weaker associations; Entropy tends to rise quickly and plateau, whereas Empowerment grows steadily, implying a staged exploration strategy. Agents with intrinsic rewards underperform compared to extrinsic baselines, highlighting the need for better intrinsic reward designs that approximate the human-aligned objectives. The work also suggests that private speech, especially goal verbalizations, may aid children's exploration, pointing to a potential role of language in guiding exploration. The dataset and code are publicly available to spur further cross-disciplinary research in cognitive science and AI.

Abstract

What drives exploration? Understanding intrinsic motivation is a long-standing challenge in both cognitive science and artificial intelligence; numerous objectives have been proposed and used to train agents, yet there remains a gap between human and agent exploration. We directly compare adults, children, and AI agents in a complex open-ended environment, Crafter, and study how common intrinsic objectives: Entropy, Information Gain, and Empowerment, relate to their behavior. We find that only Entropy and Empowerment are consistently positively correlated with human exploration progress, indicating that these objectives may better inform intrinsic reward design for agents. Furthermore, across agents and humans we observe that Entropy initially increases rapidly, then plateaus, while Empowerment increases continuously, suggesting that state diversity may provide more signal in early exploration, while advanced exploration should prioritize control. Finally, we find preliminary evidence that private speech utterances, and particularly goal verbalizations, may aid exploration in children. Our data is available at https://github.com/alyd/humans_in_crafter_data.

Intrinsically-Motivated Humans and Agents in Open-World Exploration

TL;DR

This study investigates intrinsic motivation in open-ended exploration by directly comparing adults, children, and RL agents in Crafter, a Minecraft-like environment, across entropy, information gain, and empowerment objectives. It finds that Entropy and Empowerment reliably correlate with human exploration progress, while Information Gain shows weaker associations; Entropy tends to rise quickly and plateau, whereas Empowerment grows steadily, implying a staged exploration strategy. Agents with intrinsic rewards underperform compared to extrinsic baselines, highlighting the need for better intrinsic reward designs that approximate the human-aligned objectives. The work also suggests that private speech, especially goal verbalizations, may aid children's exploration, pointing to a potential role of language in guiding exploration. The dataset and code are publicly available to spur further cross-disciplinary research in cognitive science and AI.

Abstract

What drives exploration? Understanding intrinsic motivation is a long-standing challenge in both cognitive science and artificial intelligence; numerous objectives have been proposed and used to train agents, yet there remains a gap between human and agent exploration. We directly compare adults, children, and AI agents in a complex open-ended environment, Crafter, and study how common intrinsic objectives: Entropy, Information Gain, and Empowerment, relate to their behavior. We find that only Entropy and Empowerment are consistently positively correlated with human exploration progress, indicating that these objectives may better inform intrinsic reward design for agents. Furthermore, across agents and humans we observe that Entropy initially increases rapidly, then plateaus, while Empowerment increases continuously, suggesting that state diversity may provide more signal in early exploration, while advanced exploration should prioritize control. Finally, we find preliminary evidence that private speech utterances, and particularly goal verbalizations, may aid exploration in children. Our data is available at https://github.com/alyd/humans_in_crafter_data.

Paper Structure

This paper contains 17 sections, 6 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Left: Example screen from Crafter hafner2021benchmarking. The player is at the center of the viewing window; the yellow arrow shows which direction they are facing. Their health, food, water and energy status are at the bottom left, the raw materials they have collected are at the bottom right, and the tools built so far are in the bottom row. Middle: Actions available to the human participants and RL agents. Right: We compare behaviors of children, adults, and RL agents.
  • Figure 2: Summary density histograms for each exploration score. For clarity, the left plot for each measure shows human scores and the right plot shows agent scores.
  • Figure 3: Histograms showing the distribution of each normalized information-theoretic objective attained by each group of humans and agents, their means and standard deviations over time, and a scatter plot of overall Entropy and Empowerment for each human participant and the mean and standard deviations for each type of agent.
  • Figure 4: Information-theoretic objectives vs. exploration scores. Adults and children are scattered individually, while the mean and standard deviation across the random seeds is plot for each type of AI agent. The line of best fit is plot for adults and children when correlation is significant ($p<0.05$).
  • Figure 5: Left: the distribution of word counts from the subjects' private speech. Center: the fraction of private speech utterances that were classified as expressing goals versus questions. Right: the fraction of verbalized goals vs Mean Achievement Score in children. We find a strong significant correlation ($\rho=0.8, p=0.005$ unadjusted). No significant correlation was found for the fraction of questions, or for the adults with any scores.
  • ...and 5 more figures