CDE: Concept-Driven Exploration for Reinforcement Learning
Le Mao, Andrew H. Liu, Renos Zabounidis, Zachary Kingston, Joseph Campbell
TL;DR
CDE addresses the challenge of visual RL exploration under sparse rewards by using a pre-trained vision-language model to generate object-centric concepts from task descriptions. The policy learns to reconstruct these concepts via a concept-embedding framework, and the reconstruction error serves as an intrinsic reward to drive targeted exploration, while the VLM is only required during training. The approach yields robust, object-centric exploration across five visual manipulation tasks and demonstrates sim-to-real transfer with a real Franka arm, achieving up to 80% real-world success. By incorporating dual object representations for visible and non-visible states, CDE remains effective with wrist-mounted cameras and shows resilience to noisy VLM outputs, offering a practical path to deployment without online VLM dependence.
Abstract
Intelligent exploration remains a critical challenge in reinforcement learning (RL), especially in visual control tasks. Unlike low-dimensional state-based RL, visual RL must extract task-relevant structure from raw pixels, making exploration inefficient. We propose Concept-Driven Exploration (CDE), which leverages a pre-trained vision-language model (VLM) to generate object-centric visual concepts from textual task descriptions as weak, potentially noisy supervisory signals. Rather than directly conditioning on these noisy signals, CDE trains a policy to reconstruct the concepts via an auxiliary objective, using reconstruction accuracy as an intrinsic reward to guide exploration toward task-relevant objects. Because the policy internalizes these concepts, VLM queries are only needed during training, reducing dependence on external models during deployment. Across five challenging simulated visual manipulation tasks, CDE achieves efficient, targeted exploration and remains robust to noisy VLM predictions. Finally, we demonstrate real-world transfer by deploying CDE on a Franka Research 3 arm, attaining an 80\% success rate in a real-world manipulation task.
