$Agent^2$: An Agent-Generates-Agent Framework for Reinforcement Learning Automation

Yuan Wei; Xiaohan Shan; Ran Miao; Jianmin Li

$Agent^2$: An Agent-Generates-Agent Framework for Reinforcement Learning Automation

Yuan Wei, Xiaohan Shan, Ran Miao, Jianmin Li

TL;DR

Agent$^2$ tackles the barriers to practical RL deployment by introducing a dual-agent, LLM-driven framework that fully automates RL agent design. It decomposes development into two stages—MDP modeling and algorithmic optimization—under the Model Context Protocol, enabling end-to-end agent generation without human intervention. Across MuJoCo, MetaDrive, MPE, and SMAC, Agent$^2$ yields consistent performance gains over manually designed baselines, with ablation studies showing substantial benefits from both MDP adaptation and subsequent optimization. This work demonstrates a scalable paradigm where agents design and refine other agents, accelerating automated AI development and broadening RL applicability.

Abstract

Reinforcement learning (RL) agent development traditionally requires substantial expertise and iterative effort, often leading to high failure rates and limited accessibility. This paper introduces Agent$^2$, an LLM-driven agent-generates-agent framework for fully automated RL agent design. Agent$^2$ autonomously translates natural language task descriptions and environment code into executable RL solutions without human intervention. The framework adopts a dual-agent architecture: a Generator Agent that analyzes tasks and designs agents, and a Target Agent that is automatically generated and executed. To better support automation, RL development is decomposed into two stages, MDP modeling and algorithmic optimization, facilitating targeted and effective agent generation. Built on the Model Context Protocol, Agent$^2$ provides a unified framework for standardized agent creation across diverse environments and algorithms, incorporating adaptive training management and intelligent feedback analysis for continuous refinement. Extensive experiments on benchmarks including MuJoCo, MetaDrive, MPE, and SMAC show that Agent$^2$ outperforms manually designed baselines across all tasks, achieving up to 55\% performance improvement with consistent average gains. By enabling a closed-loop, end-to-end automation pipeline, this work advances a new paradigm in which agents can design and optimize other agents, underscoring the potential of agent-generates-agent systems for automated AI development.

$Agent^2$: An Agent-Generates-Agent Framework for Reinforcement Learning Automation

TL;DR

Agent

tackles the barriers to practical RL deployment by introducing a dual-agent, LLM-driven framework that fully automates RL agent design. It decomposes development into two stages—MDP modeling and algorithmic optimization—under the Model Context Protocol, enabling end-to-end agent generation without human intervention. Across MuJoCo, MetaDrive, MPE, and SMAC, Agent

yields consistent performance gains over manually designed baselines, with ablation studies showing substantial benefits from both MDP adaptation and subsequent optimization. This work demonstrates a scalable paradigm where agents design and refine other agents, accelerating automated AI development and broadening RL applicability.

Abstract

, an LLM-driven agent-generates-agent framework for fully automated RL agent design. Agent

autonomously translates natural language task descriptions and environment code into executable RL solutions without human intervention. The framework adopts a dual-agent architecture: a Generator Agent that analyzes tasks and designs agents, and a Target Agent that is automatically generated and executed. To better support automation, RL development is decomposed into two stages, MDP modeling and algorithmic optimization, facilitating targeted and effective agent generation. Built on the Model Context Protocol, Agent

provides a unified framework for standardized agent creation across diverse environments and algorithms, incorporating adaptive training management and intelligent feedback analysis for continuous refinement. Extensive experiments on benchmarks including MuJoCo, MetaDrive, MPE, and SMAC show that Agent

outperforms manually designed baselines across all tasks, achieving up to 55\% performance improvement with consistent average gains. By enabling a closed-loop, end-to-end automation pipeline, this work advances a new paradigm in which agents can design and optimize other agents, underscoring the potential of agent-generates-agent systems for automated AI development.

$Agent^2$: An Agent-Generates-Agent Framework for Reinforcement Learning Automation

TL;DR

Abstract

$Agent^2$: An Agent-Generates-Agent Framework for Reinforcement Learning Automation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)