Table of Contents
Fetching ...

Learning World Models for Unconstrained Goal Navigation

Yuanlin Duan, Wensen Mao, He Zhu

TL;DR

A novel goal-directed exploration algorithm, MUN (short for"World Models for Unconstrained Goal Navigation"), capable of modeling state transitions between arbitrary subgoal states in the replay buffer, thereby facilitating the learning of policies to navigate between any"key"states.

Abstract

Learning world models offers a promising avenue for goal-conditioned reinforcement learning with sparse rewards. By allowing agents to plan actions or exploratory goals without direct interaction with the environment, world models enhance exploration efficiency. The quality of a world model hinges on the richness of data stored in the agent's replay buffer, with expectations of reasonable generalization across the state space surrounding recorded trajectories. However, challenges arise in generalizing learned world models to state transitions backward along recorded trajectories or between states across different trajectories, hindering their ability to accurately model real-world dynamics. To address these challenges, we introduce a novel goal-directed exploration algorithm, MUN (short for "World Models for Unconstrained Goal Navigation"). This algorithm is capable of modeling state transitions between arbitrary subgoal states in the replay buffer, thereby facilitating the learning of policies to navigate between any "key" states. Experimental results demonstrate that MUN strengthens the reliability of world models and significantly improves the policy's capacity to generalize across new goal settings.

Learning World Models for Unconstrained Goal Navigation

TL;DR

A novel goal-directed exploration algorithm, MUN (short for"World Models for Unconstrained Goal Navigation"), capable of modeling state transitions between arbitrary subgoal states in the replay buffer, thereby facilitating the learning of policies to navigate between any"key"states.

Abstract

Learning world models offers a promising avenue for goal-conditioned reinforcement learning with sparse rewards. By allowing agents to plan actions or exploratory goals without direct interaction with the environment, world models enhance exploration efficiency. The quality of a world model hinges on the richness of data stored in the agent's replay buffer, with expectations of reasonable generalization across the state space surrounding recorded trajectories. However, challenges arise in generalizing learned world models to state transitions backward along recorded trajectories or between states across different trajectories, hindering their ability to accurately model real-world dynamics. To address these challenges, we introduce a novel goal-directed exploration algorithm, MUN (short for "World Models for Unconstrained Goal Navigation"). This algorithm is capable of modeling state transitions between arbitrary subgoal states in the replay buffer, thereby facilitating the learning of policies to navigate between any "key" states. Experimental results demonstrate that MUN strengthens the reliability of world models and significantly improves the policy's capacity to generalize across new goal settings.

Paper Structure

This paper contains 38 sections, 11 equations, 11 figures, 7 tables, 10 algorithms.

Figures (11)

  • Figure 1: The general framework of model-based RL.
  • Figure 2: In Fig. \ref{['fig:keystate']}, we illustrate the key states involved in completing the task of 3-block stacking. In Fig. \ref{['fig:two-direction-RB']}, we demonstrate the significant advantages of the bidirectional replay buffer used in MUN over traditional methods in learning world models.
  • Figure 3: We evaluate MUN on 6 environments: Ant Maze, Walker, 3-Block Stacking, Block Rotation, Pen Rotation, Fetch Slide.
  • Figure 4: Experiment results comparing MUN with the baselines over 5 random seeds.
  • Figure 5: The world model prediction error curves throughout the training steps for 3-Block Stacking and Pen Rotation.
  • ...and 6 more figures