InDRiVE: Intrinsic Disagreement based Reinforcement for Vehicle Exploration through Curiosity Driven Generalized World Model
Feeza Khan Khanzada, Jaerock Kwon
TL;DR
The paper tackles generalization under sparse rewards in autonomous driving by proposing InDRiVE, a Dreamer-based model-based RL method that relies solely on intrinsic, disagreement-based rewards from an ensemble world model. It develops a latent world model using a Recurrent State-Space Model and latent-disagreement rewards to drive exploration, followed by a two-phase training procedure that enables zero-shot or few-shot adaptation to downstream tasks like lane following and collision avoidance. Empirical results in CARLA show InDRiVE achieves higher success rates and fewer infractions than DreamerV2 and DreamerV3 baselines while using significantly fewer training steps, and demonstrates robust zero-shot transfer to unseen towns with rapid fine-tuning when needed. The work highlights the viability of fully intrinsic exploration for learning robust, scalable driving policies and points to broader implications for self-supervised, multi-task autonomous systems.
Abstract
Model-based Reinforcement Learning (MBRL) has emerged as a promising paradigm for autonomous driving, where data efficiency and robustness are critical. Yet, existing solutions often rely on carefully crafted, task specific extrinsic rewards, limiting generalization to new tasks or environments. In this paper, we propose InDRiVE (Intrinsic Disagreement based Reinforcement for Vehicle Exploration), a method that leverages purely intrinsic, disagreement based rewards within a Dreamer based MBRL framework. By training an ensemble of world models, the agent actively explores high uncertainty regions of environments without any task specific feedback. This approach yields a task agnostic latent representation, allowing for rapid zero shot or few shot fine tuning on downstream driving tasks such as lane following and collision avoidance. Experimental results in both seen and unseen environments demonstrate that InDRiVE achieves higher success rates and fewer infractions compared to DreamerV2 and DreamerV3 baselines despite using significantly fewer training steps. Our findings highlight the effectiveness of purely intrinsic exploration for learning robust vehicle control behaviors, paving the way for more scalable and adaptable autonomous driving systems.
