Table of Contents
Fetching ...

Exploration-Driven Generative Interactive Environments

Nedko Savov, Naser Kazemi, Mohammad Mahdi, Danda Pani Paudel, Xi Wang, Luc Van Gool

TL;DR

The paper tackles the data bottleneck in training large multi-environment world models by introducing RetroAct and an open GenieRedux family (GenieRedux and GenieRedux-G). It proposes AutoExplore, an uncertainty-driven exploration agent that collects diverse data without environment rewards, improving both visual fidelity and controllability after fine-tuning. By pretraining on large subsets of labeled environments (Platformers-200) and refining with data from AutoExplore and token-based losses like TDCE, the approach achieves substantial gains (e.g., up to 7.4 PSNR in fidelity and up to 1.4 ΔPSNR in controllability) and demonstrates improved generalization across unseen scenes. The work provides a cost-effective, environment-agnostic framework for pretraining world models at scale, supported by public code and the RetroAct dataset, enabling easier deployment in new domains.

Abstract

Modern world models require costly and time-consuming collection of large video datasets with action demonstrations by people or by environment-specific agents. To simplify training, we focus on using many virtual environments for inexpensive, automatically collected interaction data. Genie, a recent multi-environment world model, demonstrates simulation abilities of many environments with shared behavior. Unfortunately, training their model requires expensive demonstrations. Therefore, we propose a training framework merely using a random agent in virtual environments. While the model trained in this manner exhibits good controls, it is limited by the random exploration possibilities. To address this limitation, we propose AutoExplore Agent - an exploration agent that entirely relies on the uncertainty of the world model, delivering diverse data from which it can learn the best. Our agent is fully independent of environment-specific rewards and thus adapts easily to new environments. With this approach, the pretrained multi-environment model can quickly adapt to new environments achieving video fidelity and controllability improvement. In order to obtain automatically large-scale interaction datasets for pretraining, we group environments with similar behavior and controls. To this end, we annotate the behavior and controls of 974 virtual environments - a dataset that we name RetroAct. For building our model, we first create an open implementation of Genie - GenieRedux and apply enhancements and adaptations in our version GenieRedux-G. Our code and data are available at https://github.com/insait-institute/GenieRedux.

Exploration-Driven Generative Interactive Environments

TL;DR

The paper tackles the data bottleneck in training large multi-environment world models by introducing RetroAct and an open GenieRedux family (GenieRedux and GenieRedux-G). It proposes AutoExplore, an uncertainty-driven exploration agent that collects diverse data without environment rewards, improving both visual fidelity and controllability after fine-tuning. By pretraining on large subsets of labeled environments (Platformers-200) and refining with data from AutoExplore and token-based losses like TDCE, the approach achieves substantial gains (e.g., up to 7.4 PSNR in fidelity and up to 1.4 ΔPSNR in controllability) and demonstrates improved generalization across unseen scenes. The work provides a cost-effective, environment-agnostic framework for pretraining world models at scale, supported by public code and the RetroAct dataset, enabling easier deployment in new domains.

Abstract

Modern world models require costly and time-consuming collection of large video datasets with action demonstrations by people or by environment-specific agents. To simplify training, we focus on using many virtual environments for inexpensive, automatically collected interaction data. Genie, a recent multi-environment world model, demonstrates simulation abilities of many environments with shared behavior. Unfortunately, training their model requires expensive demonstrations. Therefore, we propose a training framework merely using a random agent in virtual environments. While the model trained in this manner exhibits good controls, it is limited by the random exploration possibilities. To address this limitation, we propose AutoExplore Agent - an exploration agent that entirely relies on the uncertainty of the world model, delivering diverse data from which it can learn the best. Our agent is fully independent of environment-specific rewards and thus adapts easily to new environments. With this approach, the pretrained multi-environment model can quickly adapt to new environments achieving video fidelity and controllability improvement. In order to obtain automatically large-scale interaction datasets for pretraining, we group environments with similar behavior and controls. To this end, we annotate the behavior and controls of 974 virtual environments - a dataset that we name RetroAct. For building our model, we first create an open implementation of Genie - GenieRedux and apply enhancements and adaptations in our version GenieRedux-G. Our code and data are available at https://github.com/insait-institute/GenieRedux.

Paper Structure

This paper contains 18 sections, 3 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Our proposed world model training framework. It consists of a pretrained multi-environment world model on random agent data, and a new AutoExplore Agent that explores an environment and delivers diverse data for fine-tuning.
  • Figure 2: Method Overview. We propose an alternative to costly human interaction data collection - by exploring environments with an agent. The reward is solely based on the classification uncertainty of our model.
  • Figure 3: RetroAct Annotation. Description of environments in RetroAct by annotated attribute. Better viewed zoomed.
  • Figure 3: Comparison of GenieRedux and GenieRedux-G on Diverse Test Set. The models are trained with data collected by random agent and trained agent (-TA), and tested on data collected by a trained agent from the Coinrun environment.
  • Figure 4: Control Of GenieRedux-G-50. Demonstrating all controls of our multi-environment model on multiple games.
  • ...and 4 more figures