QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning?
Zhouyang Jiang, Bin Zhang, Yuanjun Li, Zhiwei Xu
TL;DR
This work tackles credit assignment in cooperative multi-agent reinforcement learning by replacing traditional mixing networks with a Training-Free Credit Assignment Function (TFCAF) generated by Large Language Models. A novel coder-evaluator framework enables zero-shot TFCAF construction with automated error checking and selection, while an IGM-Gating mechanism provides task-adaptive control over monotonicity constraints. Empirical results across MARL benchmarks show QLLM achieves superior performance and generalizes to various mixing-based algorithms, with reduced trainable parameters and enhanced interpretability. The approach promises scalable, human-readable credit attribution in complex multi-agent systems and broad compatibility with existing CTDE frameworks.
Abstract
Credit assignment has remained a fundamental challenge in multi-agent reinforcement learning (MARL). Previous studies have primarily addressed this issue through value decomposition methods under the centralized training with decentralized execution paradigm, where neural networks are utilized to approximate the nonlinear relationship between individual Q-values and the global Q-value. Although these approaches have achieved considerable success in various benchmark tasks, they still suffer from several limitations, including imprecise attribution of contributions, limited interpretability, and poor scalability in high-dimensional state spaces. To address these challenges, we propose a novel algorithm, QLLM, which facilitates the automatic construction of credit assignment functions using large language models (LLMs). Specifically, the concept of TFCAF is introduced, wherein the credit allocation process is represented as a direct and expressive nonlinear functional formulation. A custom-designed coder-evaluator framework is further employed to guide the generation and verification of executable code by LLMs, significantly mitigating issues such as hallucination and shallow reasoning during inference. Furthermore, an IGM-Gating Mechanism enables QLLM to flexibly enforce or relax the monotonicity constraint depending on task demands, covering both IGM-compliant and non-monotonic scenarios. Extensive experiments conducted on several standard MARL benchmarks demonstrate that the proposed method consistently outperforms existing state-of-the-art baselines. Moreover, QLLM exhibits strong generalization capability and maintains compatibility with a wide range of MARL algorithms that utilize mixing networks, positioning it as a promising and versatile solution for complex multi-agent scenarios. The code is available at https://github.com/zhouyangjiang71-sys/QLLM.
