Contrastive Initial State Buffer for Reinforcement Learning
Nico Messikommer, Yunlong Song, Davide Scaramuzza
TL;DR
This work tackles the challenge of sample-efficient reinforcement learning by reusing past experiences to steer data collection through an Initial State Buffer (ISB). It introduces a Contrastive Learning Buffer (CL-Buffer) that learns an embedding space where states with similar learning experiences are grouped together, enabling adaptive, diverse state sampling via K-Means clustering. Across quadruped locomotion and drone racing tasks, the CL-Buffer accelerates convergence and boosts final performance (e.g., up to 18.3% improvement on the quadruped task and a 0.9 vs 0.2 success-rate advantage in drone racing) without altering the underlying RL algorithm. The approach offers a general, prior-free mechanism to improve data efficiency in robotics and can be extended with priors or prioritized sampling for further gains.
Abstract
In Reinforcement Learning, the trade-off between exploration and exploitation poses a complex challenge for achieving efficient learning from limited samples. While recent works have been effective in leveraging past experiences for policy updates, they often overlook the potential of reusing past experiences for data collection. Independent of the underlying RL algorithm, we introduce the concept of a Contrastive Initial State Buffer, which strategically selects states from past experiences and uses them to initialize the agent in the environment in order to guide it toward more informative states. We validate our approach on two complex robotic tasks without relying on any prior information about the environment: (i) locomotion of a quadruped robot traversing challenging terrains and (ii) a quadcopter drone racing through a track. The experimental results show that our initial state buffer achieves higher task performance than the nominal baseline while also speeding up training convergence.
