Table of Contents
Fetching ...

OMNI: Open-endedness via Models of human Notions of Interestingness

Jenny Zhang, Joel Lehman, Kenneth Stanley, Jeff Clune

TL;DR

OMNI tackles the Achilles' Heel of open-ended learning by encoding human notions of interestingness into a Model of Interestingness (MoI) derived from foundation models, and combining it with a Learning Progress Curriculum (LP) to prioritize tasks that are both learnable and worthwhile. The approach is validated in finite-task domains (Crafter, BabyAI) and an infinite-task space (AI2-THOR), where OMNI consistently outperforms uniform sampling and LP alone, and approaches oracle MoI performance. In the infinite-task setting, a GPT-4 driven task generator paired with LP demonstrates sustained discovery of learnable tasks, with MoI-filtering further enhancing efficiency and diversity. The results suggest a general, scalable recipe for auto-curricula that leverages human-aligned judgments to steer open-ended exploration toward meaningful progress, while highlighting avenues for safety and refinement via human feedback.

Abstract

Open-ended algorithms aim to learn new, interesting behaviors forever. That requires a vast environment search space, but there are thus infinitely many possible tasks. Even after filtering for tasks the current agent can learn (i.e., learning progress), countless learnable yet uninteresting tasks remain (e.g., minor variations of previously learned tasks). An Achilles Heel of open-endedness research is the inability to quantify (and thus prioritize) tasks that are not just learnable, but also $\textit{interesting}$ (e.g., worthwhile and novel). We propose solving this problem by $\textit{Open-endedness via Models of human Notions of Interestingness}$ (OMNI). The insight is that we can utilize foundation models (FMs) as a model of interestingness (MoI), because they $\textit{already}$ internalize human concepts of interestingness from training on vast amounts of human-generated data, where humans naturally write about what they find interesting or boring. We show that FM-based MoIs improve open-ended learning by focusing on tasks that are both learnable $\textit{and interesting}$, outperforming baselines based on uniform task sampling or learning progress alone. This approach has the potential to dramatically advance the ability to intelligently select which tasks to focus on next (i.e., auto-curricula), and could be seen as AI selecting its own next task to learn, facilitating self-improving AI and AI-Generating Algorithms. Project website at https://www.jennyzhangzt.com/omni/

OMNI: Open-endedness via Models of human Notions of Interestingness

TL;DR

OMNI tackles the Achilles' Heel of open-ended learning by encoding human notions of interestingness into a Model of Interestingness (MoI) derived from foundation models, and combining it with a Learning Progress Curriculum (LP) to prioritize tasks that are both learnable and worthwhile. The approach is validated in finite-task domains (Crafter, BabyAI) and an infinite-task space (AI2-THOR), where OMNI consistently outperforms uniform sampling and LP alone, and approaches oracle MoI performance. In the infinite-task setting, a GPT-4 driven task generator paired with LP demonstrates sustained discovery of learnable tasks, with MoI-filtering further enhancing efficiency and diversity. The results suggest a general, scalable recipe for auto-curricula that leverages human-aligned judgments to steer open-ended exploration toward meaningful progress, while highlighting avenues for safety and refinement via human feedback.

Abstract

Open-ended algorithms aim to learn new, interesting behaviors forever. That requires a vast environment search space, but there are thus infinitely many possible tasks. Even after filtering for tasks the current agent can learn (i.e., learning progress), countless learnable yet uninteresting tasks remain (e.g., minor variations of previously learned tasks). An Achilles Heel of open-endedness research is the inability to quantify (and thus prioritize) tasks that are not just learnable, but also (e.g., worthwhile and novel). We propose solving this problem by (OMNI). The insight is that we can utilize foundation models (FMs) as a model of interestingness (MoI), because they internalize human concepts of interestingness from training on vast amounts of human-generated data, where humans naturally write about what they find interesting or boring. We show that FM-based MoIs improve open-ended learning by focusing on tasks that are both learnable , outperforming baselines based on uniform task sampling or learning progress alone. This approach has the potential to dramatically advance the ability to intelligently select which tasks to focus on next (i.e., auto-curricula), and could be seen as AI selecting its own next task to learn, facilitating self-improving AI and AI-Generating Algorithms. Project website at https://www.jennyzhangzt.com/omni/
Paper Structure (41 sections, 2 equations, 30 figures, 1 table, 2 algorithms)

This paper contains 41 sections, 2 equations, 30 figures, 1 table, 2 algorithms.

Figures (30)

  • Figure 1: Overview of OMNI. OMNI enables open-ended learning in vast environment search spaces by ensuring that tasks trained on not only have high learning progress, but are also interesting (harnessing large AI models to make such a heretofore impossible judgement).
  • Figure 2: Crafter and BabyAI environments. (Left) Agent view in a procedurally generated Crafter world, showing terrain types, resources, and the agent's inventory. (Middle) The 15 tasks considered interesting for Crafter analyses. Arrows indicate which tasks in the technology tree must be completed, often multiple times, along the way to perform more challenging tasks. (Right) Bird's-eye view of a randomly generated BabyAI environment, showing different object types, colors, locations, and states. The agent is the red triangle and its view (sometimes occluded) is highlighted in light grey. In this example, the agent starts from the bottom right room, and is tasked to "go to a red ball". To succeed, the agent must open the green door (sometimes locked) to reach the red ball.
  • Figure 3: Results in Crafter. (Left) Conditional success probabilities of all tasks in Crafter. Tasks are organized from simple to complex based on the prerequisite tasks that must be accomplished before completing the target task. Task names (left of each row) are readable in a digital format with zoom. (Right) Performance in Crafter on all tasks. While OMNI biases training towards interesting tasks, it achieves higher average task success rates and learns more tasks than uniform sampling or choosing tasks based on learning progress alone, even across all tasks.
  • Figure 5: AI2-THOR environment and results. (Left) Agent's egocentric view and bird's-eye view in an AI2-THOR kitchen environment. (Right) OMNI learns more tasks than the Learning Progress and Uniform sampling baselines. Example tasks learned by OMNI are shown in gray boxes.
  • Figure 6: The process of determining an agent's learning progress on a task from its measured success probability on that task in an example fictional problem.
  • ...and 25 more figures