Speaking the Language of Teamwork: LLM-Guided Credit Assignment in Multi-Agent Reinforcement Learning
Muhan Lin, Shuyang Shi, Yue Guo, Vaishnav Tadiparthi, Behdad Chalaki, Ehsan Moradi Pari, Simon Stepputtis, Woojun Kim, Joseph Campbell, Katia Sycara
TL;DR
Credit attribution in MARL is challenging under sparse rewards. The paper proposes LLM-guided Credit Assignment (LCA), which uses an LLM to generate dense, agent-specific rewards by ranking states from each agent's perspective and learning per-agent potential functions to shape rewards. This potential-based decomposition mitigates ranking uncertainty and enables CTDE training to improve convergence and policy performance. Experiments across grid-world and pistonball demonstrate faster learning, higher returns, and robustness to ranking errors, including when using smaller LLMs, suggesting practical applicability in scalable MARL settings with sparse feedback.
Abstract
Credit assignment, the process of attributing credit or blame to individual agents for their contributions to a team's success or failure, remains a fundamental challenge in multi-agent reinforcement learning (MARL), particularly in environments with sparse rewards. Commonly-used approaches such as value decomposition often lead to suboptimal policies in these settings, and designing dense reward functions that align with human intuition can be complex and labor-intensive. In this work, we propose a novel framework where a large language model (LLM) generates dense, agent-specific rewards based on a natural language description of the task and the overall team goal. By learning a potential-based reward function over multiple queries, our method reduces the impact of ranking errors while allowing the LLM to evaluate each agent's contribution to the overall task. Through extensive experiments, we demonstrate that our approach achieves faster convergence and higher policy returns compared to state-of-the-art MARL baselines.
