LERO: LLM-driven Evolutionary framework with Hybrid Rewards and Enhanced Observation for Multi-Agent Reinforcement Learning
Yuan Wei, Xiaohan Shan, Jianmin Li
TL;DR
The paper tackles the twin challenges of credit assignment and partial observability in multi-agent reinforcement learning (MARL) by introducing LERO, an LLM-driven evolutionary framework that jointly optimizes two modular components: a hybrid reward function (HRF) and an observation enhancement function (OEF). An outer evolutionary loop uses an LLM as the evolutionary operator, with a selector module ranking candidate HRFs and OEFs across MARL training runs to guide subsequent generations. The approach is algorithm-agnostic and evaluated on Cooperative Navigation tasks in the Multi-Agent Particle Environment (MPE) across MAPPO, VDN, and QMIX, showing superior performance and faster convergence relative to native baselines and ablated variants. The results demonstrate that LLM-informed design and evolutionary refinement can substantially improve coordination and training efficiency in MARL, suggesting a scalable pathway for integrating language-model reasoning into multi-agent learning systems.
Abstract
Multi-agent reinforcement learning (MARL) faces two critical bottlenecks distinct from single-agent RL: credit assignment in cooperative tasks and partial observability of environmental states. We propose LERO, a framework integrating Large language models (LLMs) with evolutionary optimization to address these MARL-specific challenges. The solution centers on two LLM-generated components: a hybrid reward function that dynamically allocates individual credit through reward decomposition, and an observation enhancement function that augments partial observations with inferred environmental context. An evolutionary algorithm optimizes these components through iterative MARL training cycles, where top-performing candidates guide subsequent LLM generations. Evaluations in Multi-Agent Particle Environments (MPE) demonstrate LERO's superiority over baseline methods, with improved task performance and training efficiency.
