Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings
Kevin Frans, Seohong Park, Pieter Abbeel, Sergey Levine
TL;DR
The paper introduces Functional Reward Encoding (FRE), a framework to pretrain a generalist, zero-shot RL agent from unlabeled offline trajectories by learning a latent encoding of arbitrary reward functions. A transformer-based variational encoder maps samples of (state, reward) pairs into a latent z, enabling a decoder to predict rewards and a downstream policy to maximize rewards conditioned on z. By training on a diverse, domain-agnostic prior of random rewards and using an offline RL objective, FRE achieves competitive results on standard offline RL benchmarks and demonstrates robust zero-shot transfer to unseen tasks with minimal reward information. This approach offers a scalable path to generalist agents that can rapidly adapt to new objectives without task-specific labels or online fine-tuning, with practical impact in robotics and beyond.
Abstract
Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be immediately adapted to any new downstream tasks in a zero-shot manner? In this work, we present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem. Our main idea is to learn functional representations of any arbitrary tasks by encoding their state-reward samples using a transformer-based variational auto-encoder. This functional encoding not only enables the pre-training of an agent from a wide diversity of general unsupervised reward functions, but also provides a way to solve any new downstream tasks in a zero-shot manner, given a small number of reward-annotated samples. We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks in a range of simulated robotic benchmarks, often outperforming previous zero-shot RL and offline RL methods. Code for this project is provided at: https://github.com/kvfrans/fre
