Learning to Coordinate with Experts
Mohamad H. Danesh, Nguyen X. Khanh, Tu Trinh, Benjamin Plaut
TL;DR
This work formalizes Yield-or-Request Control (YRC-0), an unsupervised learning-to-coordinate-with-experts problem, and introduces YRC-Bench, a robust, open benchmark spanning MiniGrid, Procgen, and CLIPort to study cross-environment generalization. It proposes a soft-constraint reward with interpretable trade-offs via a parameter $\alpha$ and an AUC-based evaluation to compare methods across multiple query costs, complemented by a simulated validator (RLOracle) to guide policy selection without access to the true test distribution. Through a large-scale study of 2,600 policies across 19 environments, the authors show there is no universally best method, reveal substantial room for improvement, and argue that current gains are bottlenecked by narrow policy spaces rather than validation quality. The work provides practical recommendations and a rigorous benchmark to spur development of more expressive coordination policies and validation strategies for safe, generalizable human-AI collaboration.
Abstract
When deployed in the real world, AI agents will inevitably face challenges that exceed their individual capabilities. Leveraging assistance from experts, whether humans or highly capable AI systems, can significantly improve both safety and performance in such situations. Since expert assistance is costly, a central challenge is determining when to consult an expert. In this paper, we explore a novel variant of this problem, termed YRC-0, in which an agent must learn to collaborate with an expert in new environments in an unsupervised manner--that is, without interacting with the expert during training. This setting motivates the development of low-cost, robust approaches for training expert-leveraging agents. To support research in this area, we introduce YRC-Bench, an open-source benchmark that instantiates YRC-0 across diverse environments. YRC-Bench provides a standardized Gym-like API, simulated experts, an evaluation pipeline, and implementations of popular baselines. Toward tackling YRC-0, we propose a validation strategy and evaluate a range of learning methods, offering insights that can inform future research. Codebase: github.com/modanesh/YRC-Bench
