Table of Contents
Fetching ...

Fed-SE: Federated Self-Evolution for Privacy-Constrained Multi-Environment LLM Agents

Xiang Chen, Yuling Shi, Qizhen Lan, Yuchao Qiu, Xiaodong Gu

TL;DR

Fed-SE tackles privacy-constrained, cross-environment evolution of LLM agents by combining local, trajectory-filtered self-improvement with global, low-rank aggregation of adapter updates. By freezing the base model and updating lightweight LoRA adapters, it preserves general reasoning while specializing to environments; global aggregation in a low-rank subspace mitigates negative transfer. Empirical results across five heterogeneous tasks show an ~18% improvement over federated baselines and strong gains in long-horizon, reasoning-heavy tasks like Maze. The work demonstrates a practical, communication-efficient pathway to scalable, privacy-preserving continual learning for distributed LLM agents.

Abstract

LLM agents are widely deployed in complex interactive tasks, yet privacy constraints often preclude centralized optimization and co-evolution across dynamic environments. While Federated Learning (FL) has proven effective on static datasets, its extension to the open-ended self-evolution of agents remains underexplored. Directly applying standard FL is challenging: heterogeneous tasks and sparse, trajectory-level rewards introduce severe gradient conflicts, destabilizing the global optimization process. To bridge this gap, we propose Fed-SE, a Federated Self-Evolution framework for LLM agents. Fed-SE establishes a local evolution-global aggregation paradigm. Locally, agents employ parameter-efficient fine-tuning on filtered, high-return trajectories to achieve stable gradient updates. Globally, Fed-SE aggregates updates within a low-rank subspace that disentangles environment-specific dynamics, effectively reducing negative transfer across clients. Experiments across five heterogeneous environments demonstrate that Fed-SE improves average task success rates by approximately 18% over federated baselines, validating its effectiveness in robust cross-environment knowledge transfer in privacy-constrained deployments.

Fed-SE: Federated Self-Evolution for Privacy-Constrained Multi-Environment LLM Agents

TL;DR

Fed-SE tackles privacy-constrained, cross-environment evolution of LLM agents by combining local, trajectory-filtered self-improvement with global, low-rank aggregation of adapter updates. By freezing the base model and updating lightweight LoRA adapters, it preserves general reasoning while specializing to environments; global aggregation in a low-rank subspace mitigates negative transfer. Empirical results across five heterogeneous tasks show an ~18% improvement over federated baselines and strong gains in long-horizon, reasoning-heavy tasks like Maze. The work demonstrates a practical, communication-efficient pathway to scalable, privacy-preserving continual learning for distributed LLM agents.

Abstract

LLM agents are widely deployed in complex interactive tasks, yet privacy constraints often preclude centralized optimization and co-evolution across dynamic environments. While Federated Learning (FL) has proven effective on static datasets, its extension to the open-ended self-evolution of agents remains underexplored. Directly applying standard FL is challenging: heterogeneous tasks and sparse, trajectory-level rewards introduce severe gradient conflicts, destabilizing the global optimization process. To bridge this gap, we propose Fed-SE, a Federated Self-Evolution framework for LLM agents. Fed-SE establishes a local evolution-global aggregation paradigm. Locally, agents employ parameter-efficient fine-tuning on filtered, high-return trajectories to achieve stable gradient updates. Globally, Fed-SE aggregates updates within a low-rank subspace that disentangles environment-specific dynamics, effectively reducing negative transfer across clients. Experiments across five heterogeneous environments demonstrate that Fed-SE improves average task success rates by approximately 18% over federated baselines, validating its effectiveness in robust cross-environment knowledge transfer in privacy-constrained deployments.

Paper Structure

This paper contains 29 sections, 14 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: Motivation. Static federated methods limit agent adaptation. While directly introducing online learning into FL suffers from high variance and gradient conflicts. Fed-SE resolves this by stabilizing learning via trajectory filtering and robust subspace aggregation.
  • Figure 2: Overview of the Fed-SE Framework. The framework operates through two distinct phases: local agent self-evolution and global knowledge aggregation. Parallel client agents interact with diverse environments to optimize local low-rank adapters (LoRA) using filtered successful trajectories stored in privacy-preserving experience buffers. The central server aggregates these distributed adapter parameters to construct a global model with generalized reasoning capabilities, which is subsequently synchronized across all clients for the next communication round.
  • Figure 3: Comparative performance evolution across heterogeneous tasks. The plots illustrate the test success rate trajectories over 20 communication rounds. The solid curves represent the continuous improvement of federated methods (Fed-SE and FedAvg), while the horizontal dashed lines indicate the converged baseline performance of static approaches (Local and Centralized). Fed-SE (Blue) exhibits robust growth, consistently breaking the performance ceilings of static baselines and significantly outperforming FedAvg (Red), particularly in complex reasoning environments like Maze.
  • Figure 4: Impact of Key Components on Final Performance. Removing the success filter results in a catastrophic performance drop (-26%), while excluding history or using weighted averaging also degrades the robust baseline (66%).
  • Figure 5: Evolution Process Analysis. (a) The Maze task shows that removing history accumulation (w/o History) leads to suboptimal convergence. (b) The Wordle task demonstrates that removing the success filter (w/o Filtering) causes catastrophic performance collapse due to noise injection.
  • ...and 1 more figures