Representation-Driven Reinforcement Learning
Ofir Nabati, Guy Tennenholtz, Shie Mannor
TL;DR
RepRL introduces a representation-driven reinforcement learning framework that maps policies to a low-dimensional latent space, enabling $v(\pi) = \langle f(\pi), w\rangle$ to hold and allowing contextual-bandit algorithms to guide exploration. By learning the representation via variational inference and constructing decision sets in policy or latent space, RepRL reframes exploration-exploitation as a representation-exploitation problem. The framework is instantiated in RepRL-ES and RepRL-PG and validated on MuJoCo and MinAtar, with notable gains in sparse-reward settings, demonstrating the primacy of policy representation in efficient exploration. This work shifts the focus of RL from solely improving optimization in policy space to shaping representation quality as a lever for exploration efficiency, suggesting several avenues for future integration with large-scale pretraining and broader bandit formulations.
Abstract
We present a representation-driven framework for reinforcement learning. By representing policies as estimates of their expected values, we leverage techniques from contextual bandits to guide exploration and exploitation. Particularly, embedding a policy network into a linear feature space allows us to reframe the exploration-exploitation problem as a representation-exploitation problem, where good policy representations enable optimal exploration. We demonstrate the effectiveness of this framework through its application to evolutionary and policy gradient-based approaches, leading to significantly improved performance compared to traditional methods. Our framework provides a new perspective on reinforcement learning, highlighting the importance of policy representation in determining optimal exploration-exploitation strategies.
