Table of Contents
Fetching ...

6GAgentGym: Tool Use, Data Synthesis, and Agentic Learning for Network Management

Jiao Chen, Jianhua Tang, Xiaotong Yang, Zuohong Lv

Abstract

Autonomous 6G network management requires agents that can execute tools, observe the resulting state changes, and adapt their decisions accordingly. Existing benchmarks based on static questions or scripted episode replay, however, do not support such closed-loop interaction, limiting agents to passive evaluation without the ability to learn from environmental feedback. This paper presents 6GAgentGym to provide closed-loop capability. The framework provides an interactive environment with 42 typed tools whose effect classification distinguishes read-only observation from state-mutating configuration, backed by a learned Experiment Model calibrated on NS-3 simulation data. 6G-Forge bootstraps closed-loop training trajectories from NS-3 seeds via iterative Self-Instruct generation with execution verification against the Experiment Model. Supervised fine-tuning on the resulting corpus followed by reinforcement learning with online closed-loop interaction enables an 8B open-source model to achieve comparable overall success rate to GPT-5 on the accompanying 6GAgentBench, with stronger performance on long-horizon tasks. Together, these components provide a viable path toward autonomous, closed-loop network management.

6GAgentGym: Tool Use, Data Synthesis, and Agentic Learning for Network Management

Abstract

Autonomous 6G network management requires agents that can execute tools, observe the resulting state changes, and adapt their decisions accordingly. Existing benchmarks based on static questions or scripted episode replay, however, do not support such closed-loop interaction, limiting agents to passive evaluation without the ability to learn from environmental feedback. This paper presents 6GAgentGym to provide closed-loop capability. The framework provides an interactive environment with 42 typed tools whose effect classification distinguishes read-only observation from state-mutating configuration, backed by a learned Experiment Model calibrated on NS-3 simulation data. 6G-Forge bootstraps closed-loop training trajectories from NS-3 seeds via iterative Self-Instruct generation with execution verification against the Experiment Model. Supervised fine-tuning on the resulting corpus followed by reinforcement learning with online closed-loop interaction enables an 8B open-source model to achieve comparable overall success rate to GPT-5 on the accompanying 6GAgentBench, with stronger performance on long-horizon tasks. Together, these components provide a viable path toward autonomous, closed-loop network management.

Paper Structure

This paper contains 34 sections, 6 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Agent interaction model. The agent selects tools in a closed loop until the operator intent is verified. Configuration tools mutate network state, requiring re-observation.
  • Figure 2: Overview of the 6GAgentGym framework. Left: interactive environment with 42 typed tools and Experiment Model. Center: 6G-Forge bootstraps closed-loop trajectories via iterative Self-Instruct generation with execution verification. Right: 6GAgentBench tiered evaluation (L1--L3). Below: Agentic SFT + RL training pipeline.
  • Figure 3: The 6G-Forge data synthesis pipeline. Step 1: NS-3 traces are annotated into seed trajectories. Step 2: A teacher LLM generates new trajectories from seed demonstrations. Step 3: Execution against $M_{\theta}$ produces golden and error-recovery traces. Step 4: Verified trajectories expand the seed pool; Steps 2--4 repeat for $K$ iterations.
  • Figure 4: Visual analysis of 6GAgentBench results. (a) Performance by difficulty tier. (b) GRPO vs. DAPO RL training on the 4B model. (c) SR vs. SPL.
  • Figure 5: 6GAgentGym interactive visualization dashboard. (a) Six-dimensional network metrics with SLA overlays. (b) Decision point identification with type distribution. (c) Per-UAV latency heatmap revealing handover-induced spikes.