Table of Contents
Fetching ...

REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context in New Environments

Kaustubh Sridhar, Souradeep Dutta, Dinesh Jayaraman, Insup Lee

TL;DR

REGENT tackles rapid adaptation to unseen environments by combining retrieval augmentation with in-context learning in a semi-parametric policy. Starting from a strong, learning-free baseline (Retrieve and Play), REGENT pre-trains a transformer that conditions on sequences of query states and retrieved demonstrations, interpolating between R&P and the learned policy via a distance-based weight. The approach yields state-of-the-art generalization in JAT/Gato and ProcGen benchmarks with far fewer pretraining transitions and parameters, and remains effective without finetuning on unseen tasks. The work underscores retrieval as a powerful bias for generalist agents, proposes formal sub-optimality bounds, and outlines future directions to extend this capability to longer horizons and broader embodiment diversity.

Abstract

Building generalist agents that can rapidly adapt to new environments is a key challenge for deploying AI in the digital and real worlds. Is scaling current agent architectures the most effective way to build generalist agents? We propose a novel approach to pre-train relatively small policies on relatively small datasets and adapt them to unseen environments via in-context learning, without any finetuning. Our key idea is that retrieval offers a powerful bias for fast adaptation. Indeed, we demonstrate that even a simple retrieval-based 1-nearest neighbor agent offers a surprisingly strong baseline for today's state-of-the-art generalist agents. From this starting point, we construct a semi-parametric agent, REGENT, that trains a transformer-based policy on sequences of queries and retrieved neighbors. REGENT can generalize to unseen robotics and game-playing environments via retrieval augmentation and in-context learning, achieving this with up to 3x fewer parameters and up to an order-of-magnitude fewer pre-training datapoints, significantly outperforming today's state-of-the-art generalist agents. Website: https://kaustubhsridhar.github.io/regent-research

REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context in New Environments

TL;DR

REGENT tackles rapid adaptation to unseen environments by combining retrieval augmentation with in-context learning in a semi-parametric policy. Starting from a strong, learning-free baseline (Retrieve and Play), REGENT pre-trains a transformer that conditions on sequences of query states and retrieved demonstrations, interpolating between R&P and the learned policy via a distance-based weight. The approach yields state-of-the-art generalization in JAT/Gato and ProcGen benchmarks with far fewer pretraining transitions and parameters, and remains effective without finetuning on unseen tasks. The work underscores retrieval as a powerful bias for generalist agents, proposes formal sub-optimality bounds, and outlines future directions to extend this capability to longer horizons and broader embodiment diversity.

Abstract

Building generalist agents that can rapidly adapt to new environments is a key challenge for deploying AI in the digital and real worlds. Is scaling current agent architectures the most effective way to build generalist agents? We propose a novel approach to pre-train relatively small policies on relatively small datasets and adapt them to unseen environments via in-context learning, without any finetuning. Our key idea is that retrieval offers a powerful bias for fast adaptation. Indeed, we demonstrate that even a simple retrieval-based 1-nearest neighbor agent offers a surprisingly strong baseline for today's state-of-the-art generalist agents. From this starting point, we construct a semi-parametric agent, REGENT, that trains a transformer-based policy on sequences of queries and retrieved neighbors. REGENT can generalize to unseen robotics and game-playing environments via retrieval augmentation and in-context learning, achieving this with up to 3x fewer parameters and up to an order-of-magnitude fewer pre-training datapoints, significantly outperforming today's state-of-the-art generalist agents. Website: https://kaustubhsridhar.github.io/regent-research

Paper Structure

This paper contains 16 sections, 3 theorems, 3 equations, 19 figures, 8 tables.

Key Result

Theorem 5.2

The sub-optimality gap in environment $j$ is ${J(\pi^*_j) - J(\pi_{\texttt{REGENT}{}}^{\theta}) \leq min\{ H, H^2 (1 - e^{-\lambda d^I_{\mathcal{D}_j}}) \}}$

Figures (19)

  • Figure 1: Problem setting in JAT/Gato environments.
  • Figure 2: Problem setting in ProcGen environments adapted from mtt.
  • Figure 3: The REGENT architecture and overview. (1) A query state (from the unseen environment during deployment or from training environments' datasets during pre-training) is processed for retrieval. (2) The $n$ nearest states from a few demonstrations in an unseen environment or from a designated retrieval subset of pre-training environments' datasets are retrieved. These states, and their corresponding previous rewards and actions, are added to the context in order of their closeness to the query state, followed by the query state and previous reward. (3) The predictions from the REGENT transformer are combined with the first retrieved action. (4) At deployment, only the predicted query action is used. During pre-training, the loss from predicting all actions is used to train the transformer.
  • Figure 4: Normalized returns in the unseen Metaworld and Atari environments against the number of demonstration trajectories the agent can retrieve from or finetune on. Each agent is evaluated across 100 rollouts of different seeds in Metaworld and 15 rollouts of different seeds (with $p_{\text{sticky}}=0.05$) in Atari. We compute the overall mean and standard deviation over three training seeds for REGENT, REGENT Finetuned, the PEFT with IA3 baselines, and DRIL. See Table \ref{['tab:unseen_all_all_data']} for detailed results.
  • Figure 5: Normalized returns in unseen ProcGen environments against the number of demonstration trajectories the agent can retrieve from. REGENT and R&P agents are evaluated across 10 levels with 5 rollouts and $p_{\text{sticky}}=0.2$. We compute the overall mean and standard deviation over three training seeds for REGENT. The values for MTT are the best scores reported in mtt. See Table \ref{['tab:ood_p_0.2_figure_values']} for detailed results.
  • ...and 14 more figures

Theorems & Definitions (4)

  • Definition 5.1: Most Isolated State
  • Theorem 5.2
  • Lemma B.1
  • Theorem B.2