ColorGrid: A Multi-Agent Non-Stationary Environment for Goal Inference and Assistance
Andrey Risukhin, Kavel Rao, Ben Caffee, Alan Fan
TL;DR
ColorGrid introduces a non-stationary, asymmetric MARL benchmark to study real-time human goal inference and cooperative assistance. Using IPPO as a baseline, the paper shows that standard learning approaches struggle when a follower must infer a leader’s changing goal without explicit communication, even under symmetric information; various architectural and training strategies, including an auxiliary supervised loss and reward shaping, provide partial benefits. The authors demonstrate the critical roles of exploration cost, penalty annealing, and balanced learning, and show that warmstarting and supervised objectives can stabilize or enhance learning in certain regimes. The work provides a valuable benchmark, datasets, and visualizations to spur development of algorithms capable of robust goal inference and assistance in real-world human–AI collaboration scenarios.
Abstract
Autonomous agents' interactions with humans are increasingly focused on adapting to their changing preferences in order to improve assistance in real-world tasks. Effective agents must learn to accurately infer human goals, which are often hidden, to collaborate well. However, existing Multi-Agent Reinforcement Learning (MARL) environments lack the necessary attributes required to rigorously evaluate these agents' learning capabilities. To this end, we introduce ColorGrid, a novel MARL environment with customizable non-stationarity, asymmetry, and reward structure. We investigate the performance of Independent Proximal Policy Optimization (IPPO), a state-of-the-art (SOTA) MARL algorithm, in ColorGrid and find through extensive ablations that, particularly with simultaneous non-stationary and asymmetric goals between a ``leader'' agent representing a human and a ``follower'' assistant agent, ColorGrid is unsolved by IPPO. To support benchmarking future MARL algorithms, we release our environment code, model checkpoints, and trajectory visualizations at https://github.com/andreyrisukhin/ColorGrid.
