Table of Contents
Fetching ...

Making Universal Policies Universal

Niklas Höpner, David Kuric, Herke van Hoof

TL;DR

This work tackles learning generalist policies across agents that share observations but differ in actions by extending the universal policy framework to a cross-agent setting. It introduces UCAP, a diffusion-based planner trained on a pooled instruction-trajectory dataset and paired with agent-specific inverse dynamics models, enabling planning that generalizes across agents. Empirical results in BabyAI show positive transfer from pooling data, with conditioning on agent information—especially action-space encoding—delivering notable improvements, including up to $42.20\%$ gains over single-agent training. The study also analyzes generalization to unseen agents and discusses limitations such as slower planning and the challenge of extending to more diverse observation spaces, outlining directions for scaling to larger, heterogeneous domains.

Abstract

The development of a generalist agent capable of solving a wide range of sequential decision-making tasks remains a significant challenge. We address this problem in a cross-agent setup where agents share the same observation space but differ in their action spaces. Our approach builds on the universal policy framework, which decouples policy learning into two stages: a diffusion-based planner that generates observation sequences and an inverse dynamics model that assigns actions to these plans. We propose a method for training the planner on a joint dataset composed of trajectories from all agents. This method offers the benefit of positive transfer by pooling data from different agents, while the primary challenge lies in adapting shared plans to each agent's unique constraints. We evaluate our approach on the BabyAI environment, covering tasks of varying complexity, and demonstrate positive transfer across agents. Additionally, we examine the planner's generalisation ability to unseen agents and compare our method to traditional imitation learning approaches. By training on a pooled dataset from multiple agents, our universal policy achieves an improvement of up to $42.20\%$ in task completion accuracy compared to a policy trained on a dataset from a single agent.

Making Universal Policies Universal

TL;DR

This work tackles learning generalist policies across agents that share observations but differ in actions by extending the universal policy framework to a cross-agent setting. It introduces UCAP, a diffusion-based planner trained on a pooled instruction-trajectory dataset and paired with agent-specific inverse dynamics models, enabling planning that generalizes across agents. Empirical results in BabyAI show positive transfer from pooling data, with conditioning on agent information—especially action-space encoding—delivering notable improvements, including up to gains over single-agent training. The study also analyzes generalization to unseen agents and discusses limitations such as slower planning and the challenge of extending to more diverse observation spaces, outlining directions for scaling to larger, heterogeneous domains.

Abstract

The development of a generalist agent capable of solving a wide range of sequential decision-making tasks remains a significant challenge. We address this problem in a cross-agent setup where agents share the same observation space but differ in their action spaces. Our approach builds on the universal policy framework, which decouples policy learning into two stages: a diffusion-based planner that generates observation sequences and an inverse dynamics model that assigns actions to these plans. We propose a method for training the planner on a joint dataset composed of trajectories from all agents. This method offers the benefit of positive transfer by pooling data from different agents, while the primary challenge lies in adapting shared plans to each agent's unique constraints. We evaluate our approach on the BabyAI environment, covering tasks of varying complexity, and demonstrate positive transfer across agents. Additionally, we examine the planner's generalisation ability to unseen agents and compare our method to traditional imitation learning approaches. By training on a pooled dataset from multiple agents, our universal policy achieves an improvement of up to in task completion accuracy compared to a policy trained on a dataset from a single agent.

Paper Structure

This paper contains 24 sections, 2 equations, 9 figures, 11 tables.

Figures (9)

  • Figure 1: Example for the different plans the shared planner needs to generate for different agent types. The agent on the left follows the standard action space (forward, turn left, turn right) in the BabyAI environment while the agent on the right can move to any of the surrounding squares and turn right.
  • Figure 2: Overview of how actions are generated by UCAP. Given the starting observation $x_{0}$, the instruction $c$ and the different types of agent information $k$ to condition on, a Heun Sampler is applied for T steps to generate an observation sequence of 3 timesteps from noise. Two consecutive observations are then labelled via the inverse dynamics model to produce the action sequence the agent will take.
  • Figure 4: Mean task completion rate for a range of UCAP models on the three evaluation environments GoToObj (left), GoToDistractor (middle), GoToDistractorLarge (right) for an agent with the standard action space which belongs to the ID agent set. All results are averaged over 4 random seeds and error bars indicate the standard error.
  • Figure 5: Mean task completion rate over all ID and OOD agent types in the GoToObj and GoToDistractor environment for the universal policy trained on a mixture of dataset with a visual agent type (AT) and without a visual agent-type but conditioned on an encoding of the action space (AS). Results are averaged over 4 random seeds and error bars indicate standard errors.
  • Figure 6: Task completion rate for diffusion planners with different planning granularities trained on the mixture dataset.
  • ...and 4 more figures