$\pi2\text{vec}$: Policy Representations with Successor Features
Gianluca Scarpellini, Ksenia Konyushkova, Claudio Fantacci, Tom Le Paine, Yutian Chen, Misha Denil
TL;DR
π2vec addresses the costly process of policy evaluation in robotics by learning offline, task-agnostic representations of black-box policies. It constructs policy embeddings Ψ_π^φ through a three-step pipeline that uses a policy-agnostic encoder φ, a policy-specific successor-feature encoder ψ_π^φ learned via offline FQE, and an aggregation over canonical states, followed by a supervised performance predictor. The approach demonstrates superior offline policy ranking and selection across multiple real and simulated domains, and it highlights the importance of choosing an appropriate foundation-model encoder φ. By enabling fully offline policy selection and leveraging diverse foundation models, π2vec offers a scalable, data-efficient tool for policy evaluation in resource-constrained robotic settings.
Abstract
This paper describes $\pi2\text{vec}$, a method for representing behaviors of black box policies as feature vectors. The policy representations capture how the statistics of foundation model features change in response to the policy behavior in a task agnostic way, and can be trained from offline data, allowing them to be used in offline policy selection. This work provides a key piece of a recipe for fusing together three modern lines of research: Offline policy evaluation as a counterpart to offline RL, foundation models as generic and powerful state representations, and efficient policy selection in resource constrained environments.
