Zero-Shot Reinforcement Learning via Function Encoders

Tyler Ingebrand; Amy Zhang; Ufuk Topcu

Zero-Shot Reinforcement Learning via Function Encoders

Tyler Ingebrand, Amy Zhang, Ufuk Topcu

TL;DR

The paper tackles zero-shot transfer in reinforcement learning by introducing the function encoder, a representation learning method that encodes perturbing functions (rewards or transitions) as a linear combination of learned non-linear basis functions. The encoder produces a coefficient vector c_f that serves as an informative task context, allowing any RL algorithm to condition policies and value functions on the current task without retraining. The approach is demonstrated across hidden-parameter, multi-agent, and multi-task RL domains, showing improved data efficiency, stable training, and competitive asymptotic performance relative to strong baselines. A key strength is the linear-operator property of the encoding, which preserves linear relationships among functions and enables generalization to unseen tasks that are linear combinations of trained basis functions. The work suggests broad applicability of function-encoded task descriptions to enable efficient transfer in diverse RL settings.

Abstract

Although reinforcement learning (RL) can solve many challenging sequential decision making problems, achieving zero-shot transfer across related tasks remains a challenge. The difficulty lies in finding a good representation for the current task so that the agent understands how it relates to previously seen tasks. To achieve zero-shot transfer, we introduce the function encoder, a representation learning algorithm which represents a function as a weighted combination of learned, non-linear basis functions. By using a function encoder to represent the reward function or the transition function, the agent has information on how the current task relates to previously seen tasks via a coherent vector representation. Thus, the agent is able to achieve transfer between related tasks at run time with no additional training. We demonstrate state-of-the-art data efficiency, asymptotic performance, and training stability in three RL fields by augmenting basic RL algorithms with a function encoder task representation.

Zero-Shot Reinforcement Learning via Function Encoders

TL;DR

Abstract

Paper Structure (32 sections, 1 theorem, 19 equations, 10 figures, 1 algorithm)

This paper contains 32 sections, 1 theorem, 19 equations, 10 figures, 1 algorithm.

Introduction
Contributions
Related Works
Zero-shot RL
Basis Functions
Preliminaries
The Function Encoder
Motivation
Training a Function Encoder
Orthonormality
Zero-Shot RL via Function Encoders
A Key Assumption
Experiments
Hidden-Parameter System Identification
Multi-Agent Reinforcement Learning
...and 17 more sections

Key Result

Theorem 1

The function encoder's mapping from functions to representations is a linear operator.

Figures (10)

Figure 1: A diagram representing the workflow of function encoders. The set of functions is converted into a set of representations via a function encoder. Those representations are passed into the RL algorithm as input to the policy and value functions. The represented functions can be reward functions and/or transition functions, depending on the setting.
Figure 2: A block diagram representing the flow of information in a function encoder. The top segment of the diagram shows how to use example data to compute the representation $c_f$. The bottom segment shows how to use $c_f$ to predict $\hat{f}(x)$ for a given input $x$.
Figure 3: Comparison of MLPs, transformers, and function encoders on system identification of a hidden-parameter MDP. Each algorithm is run for three seeds, with the shaded areas representing minimum and maximum values.
Figure 4: A plot of cosine similarity between function encoder representations for hidden-parameter environments. Axes show the hidden parameter value as a ratio of its default value in the Half-Cheetah environment. This figure shows that the function encoder representations directly relate to the underlying hidden parameters in a consistent fashion, where an increasing change in a given hidden parameter leads to an increasing change in the representation.
Figure 5: Training curves for four algorithms on a partially observable game of tag. The adversary is randomly sampled from a pre-trained league. Each algorithm is run for five seeds, with shaded areas indicating minimum and maximum values.
...and 5 more figures

Theorems & Definitions (2)

Theorem 1
proof

Zero-Shot Reinforcement Learning via Function Encoders

TL;DR

Abstract

Zero-Shot Reinforcement Learning via Function Encoders

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (2)