VendiRL: A Framework for Self-Supervised Reinforcement Learning of Diversely Diverse Skills
Erik M. Lintunen
TL;DR
VendiRL introduces a self-supervised reinforcement learning framework that reframes skill diversity through the Vendi Score, an ecologically grounded diversity metric based on the eigenstructure of a kernel matrix. By allowing plug-in, composable similarity functions, the framework supports multiple notions of diversity and provides a unified, interpretable evaluation for skill diversity. The method demonstrates that different similarity notions induce distinct behaviours and enables mix-and-match diversity objectives, potentially improving scalability and transfer to richly interactive environments. Overall, VendiRL decouples what counts as diverse from how rewards are computed, enabling principled design and benchmarking of diverse skill pretraining.
Abstract
In self-supervised reinforcement learning (RL), one of the key challenges is learning a diverse set of skills to prepare agents for unknown future tasks. Despite impressive advances, scalability and evaluation remain prevalent issues. Regarding scalability, the search for meaningful skills can be obscured by high-dimensional feature spaces, where relevant features may vary across downstream task domains. For evaluating skill diversity, defining what constitutes "diversity" typically requires a hard commitment to a specific notion of what it means for skills to be diverse, potentially leading to inconsistencies in how skill diversity is understood, making results across different approaches hard to compare, and leaving many forms of diversity unexplored. To address these issues, we adopt a measure of sample diversity that translates ideas from ecology to machine learning -- the Vendi Score -- allowing the user to specify and evaluate any desired form of diversity. We demonstrate how this metric facilitates skill evaluation and introduce VendiRL, a unified framework for learning diversely diverse sets of skills. Given distinct similarity functions, VendiRL motivates distinct forms of diversity, which could support skill-diversity pretraining in new and richly interactive environments where optimising for various forms of diversity may be desirable.
