From Imitation to Exploration: End-to-end Autonomous Driving based on World Model
Yueyuan Li, Mingyang Jiang, Songan Zhang, Wei Yuan, Chunxiang Wang, Ming Yang
TL;DR
RAMBLE addresses generalization gaps in end-to-end autonomous driving by fusing imitation learning with a world-model-based reinforcement learning framework. It introduces an asymmetric VAE (V model) for multi-modal perception, a Transformer-based M model for dynamics, and an SAC-based C model for control, all trained via a staged IL-to-RL curriculum and guided by a differentiable action mask. The approach achieves state-of-the-art route completion on CARLA Leaderboard 1.0 and completes 38 interactive scenarios on Leaderboard 2.0, demonstrating robust performance in diverse weather and traffic conditions. The work highlights the value of combining imitation with exploration for efficient and safe driving policy learning, and releases RAMBLE as open-source to accelerate future research.
Abstract
In recent years, end-to-end autonomous driving architectures have gained increasing attention due to their advantage in avoiding error accumulation. Most existing end-to-end autonomous driving methods are based on Imitation Learning (IL), which can quickly derive driving strategies by mimicking expert behaviors. However, IL often struggles to handle scenarios outside the training dataset, especially in high-dynamic and interaction-intensive traffic environments. In contrast, Reinforcement Learning (RL)-based driving models can optimize driving decisions through interaction with the environment, improving adaptability and robustness. To leverage the strengths of both IL and RL, we propose RAMBLE, an end-to-end world model-based RL method for driving decision-making. RAMBLE extracts environmental context information from RGB images and LiDAR data through an asymmetrical variational autoencoder. A transformer-based architecture is then used to capture the dynamic transitions of traffic participants. Next, an actor-critic structure reinforcement learning algorithm is applied to derive driving strategies based on the latent features of the current state and dynamics. To accelerate policy convergence and ensure stable training, we introduce a training scheme that initializes the policy network using IL, and employs KL loss and soft update mechanisms to smoothly transition the model from IL to RL. RAMBLE achieves state-of-the-art performance in route completion rate on the CARLA Leaderboard 1.0 and completes all 38 scenarios on the CARLA Leaderboard 2.0, demonstrating its effectiveness in handling complex and dynamic traffic scenarios. The model will be open-sourced upon paper acceptance at https://github.com/SCP-CN-001/ramble to support further research and development in autonomous driving.
