DREAMer-VXS: A Latent World Model for Sample-Efficient AGV Exploration in Stochastic, Unobserved Environments
Agniprabha Chakraborty
TL;DR
The paper tackles the sample-inefficiency and brittleness of model-free RL for autonomous ground vehicle exploration under partial observability. It introduces DREAMer-VXS, a latent world-model framework that encodes LiDAR data with a convolutional VAE and models temporal dynamics with an RSSM, enabling policy learning entirely in imagination via long-horizon rollouts. An intrinsic curiosity mechanism driven by the world model's learning signal guides efficient exploration, and extensive simulations show orders-of-magnitude reductions in real-world interactions, improved generalization to unseen environments, and safer navigation compared to model-free baselines. The work demonstrates strong practical impact for data-efficient, robust robotic autonomy and outlines clear future directions in hierarchical planning, multi-modal sensing, and real-world deployment.
Abstract
The paradigm of learning-based robotics holds immense promise, yet its translation to real-world applications is critically hindered by the sample inefficiency and brittleness of conventional model-free reinforcement learning algorithms. In this work, we address these challenges by introducing DREAMer-VXS, a model-based framework for Autonomous Ground Vehicle (AGV) exploration that learns to plan from imagined latent trajectories. Our approach centers on learning a comprehensive world model from partial and high-dimensional LiDAR observations. This world model is composed of a Convolutional Variational Autoencoder (VAE), which learns a compact representation of the environment's structure, and a Recurrent State-Space Model (RSSM), which models complex temporal dynamics. By leveraging this learned model as a high-speed simulator, the agent can train its navigation policy almost entirely in imagination. This methodology decouples policy learning from real-world interaction, culminating in a 90% reduction in required environmental interactions to achieve expert-level performance when compared to state-of-the-art model-free SAC baselines. The agent's behavior is guided by an actor-critic policy optimized with a composite reward function that balances task objectives with an intrinsic curiosity bonus, promoting systematic exploration of unknown spaces. We demonstrate through extensive simulated experiments that DREAMer-VXS not only learns orders of magnitude faster but also develops more generalizable and robust policies, achieving a 45% increase in exploration efficiency in unseen environments and superior resilience to dynamic obstacles.
