Aligning Agents like Large Language Models

Adam Jelley; Yuhan Cao; Dave Bignell; Amos Storkey; Sam Devlin; Tabish Rashid

Aligning Agents like Large Language Models

Adam Jelley, Yuhan Cao, Dave Bignell, Amos Storkey, Sam Devlin, Tabish Rashid

TL;DR

This paper advocates treating decision-making agents like large language models by adopting a multi-stage training pipeline—unsupervised pre-training, supervised fine-tuning, and reinforcement learning from preferences—to cultivate generality and alignment in complex 3D environments. Through a pixel-based proof-of-concept in a AAA game, it demonstrates that large-scale imitation learning provides a strong behavioral prior, which can be refined via task-specific fine-tuning and reward-model-guided RLHF to achieve complex, goal-directed behavior that is difficult with RL or imitation learning alone. The work also shows that reward models can transfer from imitation representations to preference modeling, enabling efficient alignment with relatively few comparisons, and highlights challenges such as distribution shifts and spawn-imbalance that influence online alignment. Overall, the authors propose a framework for translating advances in LLMs to embodied agents, potentially enabling more robust, general, and human-aligned agents across games and real-world domains, including robotics.

Abstract

Training agents to act competently in complex 3D environments from high-dimensional visual information is challenging. Reinforcement learning is conventionally used to train such agents, but requires a carefully designed reward function, and is difficult to scale to obtain robust agents that generalize to new tasks. In contrast, Large Language Models (LLMs) demonstrate impressively general capabilities resulting from large-scale pre-training and post-training alignment, but struggle to act in complex environments. This position paper draws explicit analogies between decision-making agents and LLMs, and argues that agents should be trained like LLMs to achieve more general, robust, and aligned behaviors. We provide a proof-of-concept to demonstrate how the procedure for training LLMs can be used to train an agent in a 3D video game environment from pixels. We investigate the importance of each stage of the LLM training pipeline, while providing guidance and insights for successfully applying this approach to agents. Our paper provides an alternative perspective to contemporary LLM Agents on how recent progress in LLMs can be leveraged for decision-making agents, and we hope will illuminate a path towards developing more generally capable agents for video games and beyond. Project summary and videos: https://adamjelley.github.io/aligning-agents-like-llms .

Aligning Agents like Large Language Models

TL;DR

Abstract

Paper Structure (32 sections, 1 equation, 20 figures, 1 algorithm)

This paper contains 32 sections, 1 equation, 20 figures, 1 algorithm.

Introduction
Context and Background
Proof of Concept
Environment and Alignment Goal
Implementation Details
Does large scale pre-training provide generalization benefits in the context of agents?
Preference Modeling
Do modern reward-modeling practices apply in the context of agents?
Aligning the Agent with the Reward Model
Aligning Agent Towards Left Jumppad
Aligning Agent Towards Right Jumppad
Summary of Agent Alignment
Alternative Views and Broader Outlook
Conclusions
Discussion of General Procedure for Aligning Agents
...and 17 more sections

Figures (20)

Figure 1: Illustration of our approach for training general agents in complex environments. Pre-training on a large, diverse dataset of interactions provides a base agent which can be more effectively fine-tuned with limited demonstrations. This agent can be further refined with RL, using a reward model or external reward function. This approach is analogous to the training of modern LLMs.
Figure 2: Screenshots of the Ninja agent at a spawn point and heading towards the middle jumppad of the launch island.
Figure 3: Distribution of jumppads reached by the base agent.
Figure 4: Distribution of jumppads reached by the base agent after supervised fine-tuning (SFT) on a task-specific dataset.
Figure 5: Distribution of jumppads reached by an equivalent agent trained from scratch on the task-specific dataset used for supervised fine-tuning.
...and 15 more figures

Aligning Agents like Large Language Models

TL;DR

Abstract

Aligning Agents like Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (20)