Open-World Reinforcement Learning over Long Short-Term Imagination

Jiajian Li; Qi Wang; Yunbo Wang; Xin Jin; Yang Li; Wenjun Zeng; Xiaokang Yang

Open-World Reinforcement Learning over Long Short-Term Imagination

Jiajian Li, Qi Wang, Yunbo Wang, Xin Jin, Yang Li, Wenjun Zeng, Xiaokang Yang

TL;DR

LS-Imagine introduces a long-short-term imagination framework for open-world visual RL, enabling a world model to generate both instant and jumpy, long-horizon transitions. It leverages affordance maps derived from image zoom-ins and a multimodal U-Net to guide exploration and provide intrinsic rewards, integrating them into a mixed-imagination actor-critic pipeline. The approach demonstrates superior performance on MineDojo tasks compared with strong baselines and provides insights into the importance of long-range planning, dynamic jumping, and affordance-driven guidance. The results suggest meaningful improvements in exploration efficiency and long-horizon value estimation for open-world RL, with practical implications for scalable, vision-based control.

Abstract

Training visual reinforcement learning agents in a high-dimensional open world presents significant challenges. While various model-based methods have improved sample efficiency by learning interactive world models, these agents tend to be "short-sighted", as they are typically trained on short snippets of imagined experiences. We argue that the primary challenge in open-world decision-making is improving the exploration efficiency across a vast state space, especially for tasks that demand consideration of long-horizon payoffs. In this paper, we present LS-Imagine, which extends the imagination horizon within a limited number of state transition steps, enabling the agent to explore behaviors that potentially lead to promising long-term feedback. The foundation of our approach is to build a $\textit{long short-term world model}$. To achieve this, we simulate goal-conditioned jumpy state transitions and compute corresponding affordance maps by zooming in on specific areas within single images. This facilitates the integration of direct long-term values into behavior learning. Our method demonstrates significant improvements over state-of-the-art techniques in MineDojo.

Open-World Reinforcement Learning over Long Short-Term Imagination

TL;DR

Abstract

. To achieve this, we simulate goal-conditioned jumpy state transitions and compute corresponding affordance maps by zooming in on specific areas within single images. This facilitates the integration of direct long-term values into behavior learning. Our method demonstrates significant improvements over state-of-the-art techniques in MineDojo.

Paper Structure (40 sections, 11 equations, 16 figures, 4 tables, 1 algorithm)

This paper contains 40 sections, 11 equations, 16 figures, 4 tables, 1 algorithm.

Introduction
Problem Formulation and Notations
Method
Overview of LS-Imagine
Affordance Map and Intrinsic Reward
Affordance Map Computation via Virtual Exploration
Multimodal U-Net for Rapid Affordance Map Generation
Affordance-Driven Intrinsic Reward
Long Short-Term World Model
Learning Jumping Flags
Learning Jumpy State Transitions
Behavior Learning over Mixed Long Short-Term Imaginations
Experiments
Implementation details.
Main Comparison
...and 25 more sections

Figures (16)

Figure 1: The general framework of LS-Imagine, an MBRL agent that operates solely on raw pixels. The fundamental idea is to extend the imagination horizon within a limited number of state transition steps, enabling the agent to explore behaviors that potentially lead to promising long-term feedback.
Figure 2: The two steps for on-the-fly affordance map estimation: (a) Simulate exploration via image zoom-in and calculate the task-correlation scores of the virtual explorations using MineCLIP. (b) Learn to generate affordance maps more efficiently using a multimodal U-Net.
Figure 3: The overall architecture of the world model and the behavior learning process.
Figure 4: Comparison of LS-Imagine against strong Minecraft agents, including DreamerV3hafner2023dreamerv3, VPTbaker2022video, STEVE-1lifshitz2023steve, PTGMyuan2024pre, and Directorhafner2022deep. We present the numerical results in Table \ref{['tab:minedojo_cmp_results']} in the appendix.
Figure 5: The number of steps per episode for task completion.
...and 11 more figures

Open-World Reinforcement Learning over Long Short-Term Imagination

TL;DR

Abstract

Open-World Reinforcement Learning over Long Short-Term Imagination

Authors

TL;DR

Abstract

Table of Contents

Figures (16)