Exploration-Driven Generative Interactive Environments

Nedko Savov; Naser Kazemi; Mohammad Mahdi; Danda Pani Paudel; Xi Wang; Luc Van Gool

Exploration-Driven Generative Interactive Environments

Nedko Savov, Naser Kazemi, Mohammad Mahdi, Danda Pani Paudel, Xi Wang, Luc Van Gool

TL;DR

The paper tackles the data bottleneck in training large multi-environment world models by introducing RetroAct and an open GenieRedux family (GenieRedux and GenieRedux-G). It proposes AutoExplore, an uncertainty-driven exploration agent that collects diverse data without environment rewards, improving both visual fidelity and controllability after fine-tuning. By pretraining on large subsets of labeled environments (Platformers-200) and refining with data from AutoExplore and token-based losses like TDCE, the approach achieves substantial gains (e.g., up to 7.4 PSNR in fidelity and up to 1.4 ΔPSNR in controllability) and demonstrates improved generalization across unseen scenes. The work provides a cost-effective, environment-agnostic framework for pretraining world models at scale, supported by public code and the RetroAct dataset, enabling easier deployment in new domains.

Abstract

Modern world models require costly and time-consuming collection of large video datasets with action demonstrations by people or by environment-specific agents. To simplify training, we focus on using many virtual environments for inexpensive, automatically collected interaction data. Genie, a recent multi-environment world model, demonstrates simulation abilities of many environments with shared behavior. Unfortunately, training their model requires expensive demonstrations. Therefore, we propose a training framework merely using a random agent in virtual environments. While the model trained in this manner exhibits good controls, it is limited by the random exploration possibilities. To address this limitation, we propose AutoExplore Agent - an exploration agent that entirely relies on the uncertainty of the world model, delivering diverse data from which it can learn the best. Our agent is fully independent of environment-specific rewards and thus adapts easily to new environments. With this approach, the pretrained multi-environment model can quickly adapt to new environments achieving video fidelity and controllability improvement. In order to obtain automatically large-scale interaction datasets for pretraining, we group environments with similar behavior and controls. To this end, we annotate the behavior and controls of 974 virtual environments - a dataset that we name RetroAct. For building our model, we first create an open implementation of Genie - GenieRedux and apply enhancements and adaptations in our version GenieRedux-G. Our code and data are available at https://github.com/insait-institute/GenieRedux.

Exploration-Driven Generative Interactive Environments

TL;DR

Abstract

Exploration-Driven Generative Interactive Environments

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)