Table of Contents
Fetching ...

Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models

Cong Lu, Shengran Hu, Jeff Clune

TL;DR

The paper addresses the bottleneck of hard exploration in reinforcement learning by replacing handcrafted Go-Explore heuristics with the intelligence of giant foundation models. Intelligent Go-Explore (IGE) uses foundation models to judge state interestingness, select actions, and curate the archive, enabling robust, long-horizon exploration across language and vision tasks. Empirical results across Game of 24, BabyAI, and TextWorld show IGE significantly outperforms classic Go-Explore and state-of-the-art FM agents, demonstrating strong generalization and serendipitous discovery. This approach broadens the applicability of autonomous agents to a wider range of domains and modalities, highlighting a new direction for open-ended exploration driven by foundation-model intelligence.

Abstract

Go-Explore is a powerful family of algorithms designed to solve hard-exploration problems built on the principle of archiving discovered states, and iteratively returning to and exploring from the most promising states. This approach has led to superhuman performance across a wide variety of challenging problems including Atari games and robotic control, but requires manually designing heuristics to guide exploration (i.e., determine which states to save and explore from, and what actions to consider next), which is time-consuming and infeasible in general. To resolve this, we propose Intelligent Go-Explore (IGE) which greatly extends the scope of the original Go-Explore by replacing these handcrafted heuristics with the intelligence and internalized human notions of interestingness captured by giant pretrained foundation models (FMs). This provides IGE with a human-like ability to instinctively identify how interesting or promising any new state is (e.g., discovering new objects, locations, or behaviors), even in complex environments where heuristics are hard to define. Moreover, IGE offers the exciting opportunity to recognize and capitalize on serendipitous discoveries -- states encountered during exploration that are valuable in terms of exploration, yet where what makes them interesting was not anticipated by the human user. We evaluate our algorithm on a diverse range of language and vision-based tasks that require search and exploration. Across these tasks, IGE strongly exceeds classic reinforcement learning and graph search baselines, and also succeeds where prior state-of-the-art FM agents like Reflexion completely fail. Overall, Intelligent Go-Explore combines the tremendous strengths of FMs and the powerful Go-Explore algorithm, opening up a new frontier of research into creating more generally capable agents with impressive exploration capabilities.

Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models

TL;DR

The paper addresses the bottleneck of hard exploration in reinforcement learning by replacing handcrafted Go-Explore heuristics with the intelligence of giant foundation models. Intelligent Go-Explore (IGE) uses foundation models to judge state interestingness, select actions, and curate the archive, enabling robust, long-horizon exploration across language and vision tasks. Empirical results across Game of 24, BabyAI, and TextWorld show IGE significantly outperforms classic Go-Explore and state-of-the-art FM agents, demonstrating strong generalization and serendipitous discovery. This approach broadens the applicability of autonomous agents to a wider range of domains and modalities, highlighting a new direction for open-ended exploration driven by foundation-model intelligence.

Abstract

Go-Explore is a powerful family of algorithms designed to solve hard-exploration problems built on the principle of archiving discovered states, and iteratively returning to and exploring from the most promising states. This approach has led to superhuman performance across a wide variety of challenging problems including Atari games and robotic control, but requires manually designing heuristics to guide exploration (i.e., determine which states to save and explore from, and what actions to consider next), which is time-consuming and infeasible in general. To resolve this, we propose Intelligent Go-Explore (IGE) which greatly extends the scope of the original Go-Explore by replacing these handcrafted heuristics with the intelligence and internalized human notions of interestingness captured by giant pretrained foundation models (FMs). This provides IGE with a human-like ability to instinctively identify how interesting or promising any new state is (e.g., discovering new objects, locations, or behaviors), even in complex environments where heuristics are hard to define. Moreover, IGE offers the exciting opportunity to recognize and capitalize on serendipitous discoveries -- states encountered during exploration that are valuable in terms of exploration, yet where what makes them interesting was not anticipated by the human user. We evaluate our algorithm on a diverse range of language and vision-based tasks that require search and exploration. Across these tasks, IGE strongly exceeds classic reinforcement learning and graph search baselines, and also succeeds where prior state-of-the-art FM agents like Reflexion completely fail. Overall, Intelligent Go-Explore combines the tremendous strengths of FMs and the powerful Go-Explore algorithm, opening up a new frontier of research into creating more generally capable agents with impressive exploration capabilities.
Paper Structure (32 sections, 5 figures, 8 tables, 1 algorithm)

This paper contains 32 sections, 5 figures, 8 tables, 1 algorithm.

Figures (5)

  • Figure 1: Intelligent Go-Explore (IGE) integrates the intelligence and internalized human notions of interestingness from giant pretrained FMs into all stages of the Go-Explore first_returnecoffet2021goexplore algorithm, enabling FM agents to robustly explore in complex environments. Bottom: Classic Go-Explore solved hard exploration problems by archiving novel discovered states, resetting to promising ones via domain-specific heuristics, and then performing random exploration. Top: Our approach, Intelligent Go-Explore, enables Go-Explore to tackle virtually any type of problem that is representable in the context of a large language or multimodal model. Instead of manually defining heuristics, we query the foundation model at all stages, enabling our approach to automatically catch and return to serendipitous discoveries, and harness the power of FM agents to explore. The environment shown is the BabyAI game used in \ref{['subsec:eval_babyai']}.
  • Figure 2: IGE explores the Game of 24 with the intelligence of FMs and reaches 100% success rate on average 70.8% faster than DFS, the next best baseline. IGE completes all problems within 150 environment operations. Our use of archiving and intelligent action selection allows us to greatly outperform prior LLM agents with an equal number of operations performed. The success rate is computed over 100 test problems.
  • Figure 3: IGE can enable GPT-4o to efficiently find solutions to challenging tasks in the BabyAI text and visual environments. In the text domain, IGE does so with orders of magnitude fewer online steps than prior RL-trained baselines (GLAM, pmlr-v202-carta23a). Task types are in order of difficulty. As tasks become more difficult, the performance gap of IGE vs. the LLM baselines grows. We show the mean and 95% bootstrap confidence interval zoubir2007bootstrap over 25 seeds per environment type. Here, and elsewhere, confidence intervals are obtained by bootstrapped resampling 10,000 times.
  • Figure 4: IGE outperforms state-of-the-art FM agents in three challenging text games in TextWorld. These results illustrate the powerful capabilities of planning, commonsense reasoning, and exploration of IGE (\ref{['subsec:eval_textworld']}). Notably, in the Coin Collector game where hard exploration is required, we observe BFS-like search behavior emerge in IGE, enabling it to find the most efficient solution where all other approaches exhaust the environment horizon. We show the mean and 95% bootstrap confidence interval over 25 seeds for each game.
  • Figure 5: We visualize the 5 types of tasks that BabyAI consists of for our evaluation in \ref{['subsec:eval_babyai']}. IGE receives only partial text-based observations corresponding to the view in the figure.