Table of Contents
Fetching ...

FLEX: Continuous Agent Evolution via Forward Learning from Experience

Zhicheng Cai, Xinyuan Guo, Yu Pei, Jiangtao Feng, Jinsong Su, Jiangjie Chen, Ya-Qin Zhang, Wei-Ying Ma, Mingxuan Wang, Hao Zhou

TL;DR

Static pretrained LLM agents struggle to grow with experience. FLEX introduces a gradient-free Forward Learning from Experience framework that builds a persistent, hierarchical experience library through extensive forward exploration and semantic distillation, guiding future reasoning without parameter updates. The approach yields significant improvements across mathematics, chemistry, and biology, and reveals a scalable law of experiential growth plus cross-agent inheritance of knowledge. By decoupling learning from weights, FLEX enables continual evolution, transferable strategies, and a path toward collective, transparent AI wisdom.

Abstract

Autonomous agents driven by Large Language Models (LLMs) have revolutionized reasoning and problem-solving but remain static after training, unable to grow with experience as intelligent beings do during deployment. We introduce Forward Learning with EXperience (FLEX), a gradient-free learning paradigm that enables LLM agents to continuously evolve through accumulated experience. Specifically, FLEX cultivates scalable and inheritable evolution by constructing a structured experience library through continual reflection on successes and failures during interaction with the environment. FLEX delivers substantial improvements on mathematical reasoning, chemical retrosynthesis, and protein fitness prediction (up to 23% on AIME25, 10% on USPTO50k, and 14% on ProteinGym). We further identify a clear scaling law of experiential growth and the phenomenon of experience inheritance across agents, marking a step toward scalable and inheritable continuous agent evolution. Project Page: https://flex-gensi-thuair.github.io.

FLEX: Continuous Agent Evolution via Forward Learning from Experience

TL;DR

Static pretrained LLM agents struggle to grow with experience. FLEX introduces a gradient-free Forward Learning from Experience framework that builds a persistent, hierarchical experience library through extensive forward exploration and semantic distillation, guiding future reasoning without parameter updates. The approach yields significant improvements across mathematics, chemistry, and biology, and reveals a scalable law of experiential growth plus cross-agent inheritance of knowledge. By decoupling learning from weights, FLEX enables continual evolution, transferable strategies, and a path toward collective, transparent AI wisdom.

Abstract

Autonomous agents driven by Large Language Models (LLMs) have revolutionized reasoning and problem-solving but remain static after training, unable to grow with experience as intelligent beings do during deployment. We introduce Forward Learning with EXperience (FLEX), a gradient-free learning paradigm that enables LLM agents to continuously evolve through accumulated experience. Specifically, FLEX cultivates scalable and inheritable evolution by constructing a structured experience library through continual reflection on successes and failures during interaction with the environment. FLEX delivers substantial improvements on mathematical reasoning, chemical retrosynthesis, and protein fitness prediction (up to 23% on AIME25, 10% on USPTO50k, and 14% on ProteinGym). We further identify a clear scaling law of experiential growth and the phenomenon of experience inheritance across agents, marking a step toward scalable and inheritable continuous agent evolution. Project Page: https://flex-gensi-thuair.github.io.

Paper Structure

This paper contains 45 sections, 1 theorem, 7 equations, 7 figures, 3 tables.

Key Result

Corollary 1

The optimal experience library $\mathcal{E}^*$ that maximizes $\mathcal{J}(\mathcal{E})$ can be approximated by minimizing the expected conditional entropy:

Figures (7)

  • Figure 1: An overview of our FLEX paradigm and main results. Top: A comparison between traditional gradient-based learning, which uses Back-Propagation as the optimizing method and our proposed Forward Learning from Experience paradigm. Bottom: Main experimental results demonstrating FLEX's effectiveness. We evaluate FLEX against strong baselines across three challenging scientific domains (Mathematics, Chemistry, and Biology) on a diverse suite of over 10 models, where FLEX consistently and substantially outperforms the baselines with the cost less than 100$ for both training and evaluation of a single agent.
  • Figure 2: Illustration of the Meta-MDP formulation of FLEX. The Base-level MDP performs intra-sample exploration and experience distillation, while the Meta-level MDP integrates these experiences to evolve the global experience library through forward updates.
  • Figure 3: Concrete Instantiation of FLEX. The refinement loop of actor-critic iteratively explores and refines experiences, then the meta-level updater dynamically organizes the distilled experiences into the evolving experience library.
  • Figure 4: Training dynamics and scaling laws of FLEX on the GSM8K dataset across 5 epochs. Training accuracy and test accuracy both show strong scalability with the size of the experience library. Experience library also exhibits scaling law with the epochs.
  • Figure 5: Qualitative case studies in Mathematics, Chemistry, and Biology demonstrating the effectiveness of FLEX. In each domain, baseline agents (LLM Response, ReAct Response) fail due to critical reasoning errors (marked with ✗). In contrast, by retrieving and applying distilled knowledge (e.g., Golden rules and Warnings) from its experience library, FLEX successfully refines its strategy, overcomes the initial failures, and arrives at the correct solution (marked with ✓).
  • ...and 2 more figures

Theorems & Definitions (5)

  • Definition 1: Optimization Objective of FLEX
  • Definition 2: Update Rule of FLEX
  • Corollary 1: Information-Theoretic Reformulation of the Objective
  • Definition 3: Meta-level MDP for Experience Library Evolution
  • Definition 4: Base-level MDP for Experience Exploration