Table of Contents
Fetching ...

CuES: A Curiosity-driven and Environment-grounded Synthesis Framework for Agentic RL

Shinji Mai, Yunpeng Zhai, Ziqian Chen, Cheng Chen, Anni Zou, Shuchang Tao, Zhaoyang Liu, Bolin Ding

TL;DR

The paper tackles task scarcity in agentic RL by introducing CuES, a framework that autonomously synthesizes diverse, executable tasks directly from an environment’s structure without seed data. It blends bottom-up curiosity-driven exploration with lightweight top-down guidance (requirement confirmation and concept pools) to ensure executability, diversity, and relevance of generated tasks. Through experiments on AppWorld, WebShop, and BFCL v3, CuES-synthesized data matches or surpass manually curated datasets and yields substantial downstream policy gains, outperforming strong baselines. This work demonstrates a scalable approach to teach agents what to learn by learning from environment-grounded task synthesis, enabling robust adaptation across domains and interaction protocols.

Abstract

Large language model based agents are increasingly deployed in complex, tool augmented environments. While reinforcement learning provides a principled mechanism for such agents to improve through interaction, its effectiveness critically depends on the availability of structured training tasks. In many realistic settings, however, no such tasks exist a challenge we term task scarcity, which has become a key bottleneck for scaling agentic RL. Existing approaches typically assume predefined task collections, an assumption that fails in novel environments where tool semantics and affordances are initially unknown. To address this limitation, we formalize the problem of Task Generation for Agentic RL, where an agent must learn within a given environment that lacks predefined tasks. We propose CuES, a Curiosity driven and Environment grounded Synthesis framework that autonomously generates diverse, executable, and meaningful tasks directly from the environment structure and affordances, without relying on handcrafted seeds or external corpora. CuES drives exploration through intrinsic curiosity, abstracts interaction patterns into reusable task schemas, and refines them through lightweight top down guidance and memory based quality control. Across three representative environments, AppWorld, BFCL, and WebShop, CuES produces task distributions that match or surpass manually curated datasets in both diversity and executability, yielding substantial downstream policy improvements. These results demonstrate that curiosity driven, environment grounded task generation provides a scalable foundation for agents that not only learn how to act, but also learn what to learn. The code is available at https://github.com/modelscope/AgentEvolver/tree/main/research/CuES.

CuES: A Curiosity-driven and Environment-grounded Synthesis Framework for Agentic RL

TL;DR

The paper tackles task scarcity in agentic RL by introducing CuES, a framework that autonomously synthesizes diverse, executable tasks directly from an environment’s structure without seed data. It blends bottom-up curiosity-driven exploration with lightweight top-down guidance (requirement confirmation and concept pools) to ensure executability, diversity, and relevance of generated tasks. Through experiments on AppWorld, WebShop, and BFCL v3, CuES-synthesized data matches or surpass manually curated datasets and yields substantial downstream policy gains, outperforming strong baselines. This work demonstrates a scalable approach to teach agents what to learn by learning from environment-grounded task synthesis, enabling robust adaptation across domains and interaction protocols.

Abstract

Large language model based agents are increasingly deployed in complex, tool augmented environments. While reinforcement learning provides a principled mechanism for such agents to improve through interaction, its effectiveness critically depends on the availability of structured training tasks. In many realistic settings, however, no such tasks exist a challenge we term task scarcity, which has become a key bottleneck for scaling agentic RL. Existing approaches typically assume predefined task collections, an assumption that fails in novel environments where tool semantics and affordances are initially unknown. To address this limitation, we formalize the problem of Task Generation for Agentic RL, where an agent must learn within a given environment that lacks predefined tasks. We propose CuES, a Curiosity driven and Environment grounded Synthesis framework that autonomously generates diverse, executable, and meaningful tasks directly from the environment structure and affordances, without relying on handcrafted seeds or external corpora. CuES drives exploration through intrinsic curiosity, abstracts interaction patterns into reusable task schemas, and refines them through lightweight top down guidance and memory based quality control. Across three representative environments, AppWorld, BFCL, and WebShop, CuES produces task distributions that match or surpass manually curated datasets in both diversity and executability, yielding substantial downstream policy improvements. These results demonstrate that curiosity driven, environment grounded task generation provides a scalable foundation for agents that not only learn how to act, but also learn what to learn. The code is available at https://github.com/modelscope/AgentEvolver/tree/main/research/CuES.

Paper Structure

This paper contains 22 sections, 16 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: While performance on benchmarks(AppWorld, WebShop, and BFCLv3) continues to improve with larger LLMs, Qwen2.5 14B under the proposed CuES achieves a substantially higher accuracy across all benchmarks.
  • Figure 2: CuES pipeline. (a) Requirement Confirm constructs the concept pool $\tilde{\mathcal{C}}$ and principle $\mathit{P}$ by extracting concepts from the environment description $T_{des}$ and seed goals $\mathcal{G}_{seed}$ and filtering them with the user need $\mathcal{U}$. (b) Curious Exploration executes candidate actions conditioned on $(\mathit{P},\mathcal{G}_{seed})$, consults the environment memory tree to prioritize unseen actions, and emits triples $(s,a,o)$ (eq.\ref{['eq:triple']}) as exploration trajectories. (c) Task Abstraction groups consecutive triples within a batch into executable goals with guidelines. (d) Quality Control re-executes each goal. (e) Goal Rewrite progressively exposes guideline hints in the goal text to lower difficulty.
  • Figure 3: Original (left) vs CuES-synthesized (right) data for AppWorld and BFCL v3 Multi-Turn Base.
  • Figure 4: Distribution comparison per environment.
  • Figure 5: Comparison of original and CuES-synthesized data.
  • ...and 2 more figures