Table of Contents
Fetching ...

$Agent^2$: An Agent-Generates-Agent Framework for Reinforcement Learning Automation

Yuan Wei, Xiaohan Shan, Ran Miao, Jianmin Li

TL;DR

Agent$^2$ tackles the barriers to practical RL deployment by introducing a dual-agent, LLM-driven framework that fully automates RL agent design. It decomposes development into two stages—MDP modeling and algorithmic optimization—under the Model Context Protocol, enabling end-to-end agent generation without human intervention. Across MuJoCo, MetaDrive, MPE, and SMAC, Agent$^2$ yields consistent performance gains over manually designed baselines, with ablation studies showing substantial benefits from both MDP adaptation and subsequent optimization. This work demonstrates a scalable paradigm where agents design and refine other agents, accelerating automated AI development and broadening RL applicability.

Abstract

Reinforcement learning (RL) agent development traditionally requires substantial expertise and iterative effort, often leading to high failure rates and limited accessibility. This paper introduces Agent$^2$, an LLM-driven agent-generates-agent framework for fully automated RL agent design. Agent$^2$ autonomously translates natural language task descriptions and environment code into executable RL solutions without human intervention. The framework adopts a dual-agent architecture: a Generator Agent that analyzes tasks and designs agents, and a Target Agent that is automatically generated and executed. To better support automation, RL development is decomposed into two stages, MDP modeling and algorithmic optimization, facilitating targeted and effective agent generation. Built on the Model Context Protocol, Agent$^2$ provides a unified framework for standardized agent creation across diverse environments and algorithms, incorporating adaptive training management and intelligent feedback analysis for continuous refinement. Extensive experiments on benchmarks including MuJoCo, MetaDrive, MPE, and SMAC show that Agent$^2$ outperforms manually designed baselines across all tasks, achieving up to 55\% performance improvement with consistent average gains. By enabling a closed-loop, end-to-end automation pipeline, this work advances a new paradigm in which agents can design and optimize other agents, underscoring the potential of agent-generates-agent systems for automated AI development.

$Agent^2$: An Agent-Generates-Agent Framework for Reinforcement Learning Automation

TL;DR

Agent tackles the barriers to practical RL deployment by introducing a dual-agent, LLM-driven framework that fully automates RL agent design. It decomposes development into two stages—MDP modeling and algorithmic optimization—under the Model Context Protocol, enabling end-to-end agent generation without human intervention. Across MuJoCo, MetaDrive, MPE, and SMAC, Agent yields consistent performance gains over manually designed baselines, with ablation studies showing substantial benefits from both MDP adaptation and subsequent optimization. This work demonstrates a scalable paradigm where agents design and refine other agents, accelerating automated AI development and broadening RL applicability.

Abstract

Reinforcement learning (RL) agent development traditionally requires substantial expertise and iterative effort, often leading to high failure rates and limited accessibility. This paper introduces Agent, an LLM-driven agent-generates-agent framework for fully automated RL agent design. Agent autonomously translates natural language task descriptions and environment code into executable RL solutions without human intervention. The framework adopts a dual-agent architecture: a Generator Agent that analyzes tasks and designs agents, and a Target Agent that is automatically generated and executed. To better support automation, RL development is decomposed into two stages, MDP modeling and algorithmic optimization, facilitating targeted and effective agent generation. Built on the Model Context Protocol, Agent provides a unified framework for standardized agent creation across diverse environments and algorithms, incorporating adaptive training management and intelligent feedback analysis for continuous refinement. Extensive experiments on benchmarks including MuJoCo, MetaDrive, MPE, and SMAC show that Agent outperforms manually designed baselines across all tasks, achieving up to 55\% performance improvement with consistent average gains. By enabling a closed-loop, end-to-end automation pipeline, this work advances a new paradigm in which agents can design and optimize other agents, underscoring the potential of agent-generates-agent systems for automated AI development.

Paper Structure

This paper contains 21 sections, 4 equations, 5 figures, 4 tables, 2 algorithms.

Figures (5)

  • Figure 1: The framework of Agent$^2$ consists of three main stages. Firstly, Agent$^2$ analyzes the problem using natural language task descriptions and environment code as inputs. Secondly, the framework proceeds to MDP modeling, including the design of the state space, action space, and reward function. Thirdly, the framework is followed by algorithmic optimization, where the agent autonomously selects appropriate algorithms, designs network architectures, and tunes hyperparameters. The entire framework operates in compliance with the Model Context Protocol (MCP), ensuring standardized integration of services. Finally, the generated components are assembled into the Target Agent, which is ready for training and evaluation. The entire process supports iterative refinement to enhance solution quality.
  • Figure 2: Training curve comparisons on MuJoCo environments.
  • Figure 3: Training curve comparisons on MetaDrive and MPE environments.
  • Figure 4: Training curve comparisons on StarCraft-II environments.
  • Figure 5: Performance improvement across Task-to-MDP Mapping and Algorithmic Optimization.