Language Instructed Reinforcement Learning for Human-AI Coordination
Hengyuan Hu, Dorsa Sadigh
TL;DR
This work tackles the challenge of aligning human-AI coordination in multi-agent RL when abundant human data is unavailable. It introduces instructRL, a framework that uses a large language model to generate a prior policy conditioned on natural language instructions and regularizes RL training toward that prior, yielding human-preferred equilibria. The approach is validated in a toy Say-Select game and the Hanabi benchmark, showing that different instructions can produce semantically distinct, human-aligned policies and that humans coordinate far better when aware of the training instructions. The results suggest a scalable path to improve human-AI collaboration without large labeled human datasets, with promising directions for test-time adaptation and multi-modal instruction grounding.
Abstract
One of the fundamental quests of AI is to produce agents that coordinate well with humans. This problem is challenging, especially in domains that lack high quality human behavioral data, because multi-agent reinforcement learning (RL) often converges to different equilibria from the ones that humans prefer. We propose a novel framework, instructRL, that enables humans to specify what kind of strategies they expect from their AI partners through natural language instructions. We use pretrained large language models to generate a prior policy conditioned on the human instruction and use the prior to regularize the RL objective. This leads to the RL agent converging to equilibria that are aligned with human preferences. We show that instructRL converges to human-like policies that satisfy the given instructions in a proof-of-concept environment as well as the challenging Hanabi benchmark. Finally, we show that knowing the language instruction significantly boosts human-AI coordination performance in human evaluations in Hanabi.
