Zero-Shot Reinforcement Learning via Function Encoders
Tyler Ingebrand, Amy Zhang, Ufuk Topcu
TL;DR
The paper tackles zero-shot transfer in reinforcement learning by introducing the function encoder, a representation learning method that encodes perturbing functions (rewards or transitions) as a linear combination of learned non-linear basis functions. The encoder produces a coefficient vector c_f that serves as an informative task context, allowing any RL algorithm to condition policies and value functions on the current task without retraining. The approach is demonstrated across hidden-parameter, multi-agent, and multi-task RL domains, showing improved data efficiency, stable training, and competitive asymptotic performance relative to strong baselines. A key strength is the linear-operator property of the encoding, which preserves linear relationships among functions and enables generalization to unseen tasks that are linear combinations of trained basis functions. The work suggests broad applicability of function-encoded task descriptions to enable efficient transfer in diverse RL settings.
Abstract
Although reinforcement learning (RL) can solve many challenging sequential decision making problems, achieving zero-shot transfer across related tasks remains a challenge. The difficulty lies in finding a good representation for the current task so that the agent understands how it relates to previously seen tasks. To achieve zero-shot transfer, we introduce the function encoder, a representation learning algorithm which represents a function as a weighted combination of learned, non-linear basis functions. By using a function encoder to represent the reward function or the transition function, the agent has information on how the current task relates to previously seen tasks via a coherent vector representation. Thus, the agent is able to achieve transfer between related tasks at run time with no additional training. We demonstrate state-of-the-art data efficiency, asymptotic performance, and training stability in three RL fields by augmenting basic RL algorithms with a function encoder task representation.
