Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Xufang Luo, Yuge Zhang, Zhiyuan He, Zilong Wang, Siyun Zhao, Dongsheng Li, Luna K. Qiu, Yuqing Yang
TL;DR
Agent Lightning presents a decoupled, agent-agnostic RL framework for training LLM-based agents by recasting agent execution as an MDP and unifying data into transition-style trajectories. It introduces LightningRL, a hierarchical RL approach that enables seamless use of existing single-turn RL methods, and a Training-Agent Disaggregation architecture that separates training from agent runtime. The framework supports robust data capture, AIR, and scalable rollout through a two-component server-client design, demonstrated across text-to-SQL, retrieval-augmented generation, and math-tool usage tasks with consistent improvements. This work offers a general, scalable path to real-world agent optimization, enabling continuous learning and deployment-ready agent capabilities with minimal code changes.
Abstract
We present Agent Lightning, a flexible and extensible framework that enables Reinforcement Learning (RL)-based training of Large Language Models (LLMs) for any AI agent. Unlike existing methods that tightly couple RL training with agent or rely on sequence concatenation with masking, Agent Lightning achieves complete decoupling between agent execution and training, allowing seamless integration with existing agents developed via diverse ways (e.g., using frameworks like LangChain, OpenAI Agents SDK, AutoGen, and building from scratch) with almost ZERO code modifications. By formulating agent execution as Markov decision process, we define an unified data interface and propose a hierarchical RL algorithm, LightningRL, which contains a credit assignment module, allowing us to decompose trajectories generated by ANY agents into training transition. This enables RL to handle complex interaction logic, such as multi-agent scenarios and dynamic workflows. For the system design, we introduce a Training-Agent Disaggregation architecture, and brings agent observability frameworks into agent runtime, providing a standardized agent finetuning interface. Experiments across text-to-SQL, retrieval-augmented generation, and math tool-use tasks demonstrate stable, continuous improvements, showcasing the framework's potential for real-world agent training and deployment.
