Table of Contents
Fetching ...

TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training

Jinluan Yang, Yuxin Liu, Zhengyu Chen, Chengcheng Han, Yueqing Sun, Qi Gu, Hui Su, Xunliang Cai, Fei Wu, Kun Kuang

TL;DR

This work proposes TopoCurate, an interaction-aware framework that projects multi-trial rollouts from the same task into a unified semantic quotient topology that explicitly captures how tool invocations and environmental responses drive the divergence between effective strategies and failure modes.

Abstract

Training tool-use agents typically relies on outcome-based filtering: Supervised Fine-Tuning (SFT) on successful trajectories and Reinforcement Learning (RL) on pass-rate-selected tasks. However, this paradigm ignores interaction dynamics: successful trajectories may lack error recovery or exhibit redundancy, while pass rates fail to distinguish structurally informative tasks from trivial ones. We propose \textbf{TopoCurate}, an interaction-aware framework that projects multi-trial rollouts from the same task into a unified semantic quotient topology. By merging equivalent action-observation states, this projection transforms scattered linear trajectories into a structured manifold that explicitly captures how tool invocations and environmental responses drive the divergence between effective strategies and failure modes. Leveraging this representation, we introduce a dual-selection mechanism: for SFT, we prioritize trajectories demonstrating reflective recovery, semantic efficiency, and strategic diversity to mitigate covariate shift and mode collapse; for RL, we select tasks with high error branch ratios and strategic heterogeneity, maximizing gradient Signal-to-Noise Ratio to address vanishing signals in sparse-reward settings. Evaluations on BFCLv3 and Tau2 Bench show that TopoCurate achieves consistent gains of 4.2\% (SFT) and 6.9\% (RL) over state-of-the-art baselines. We will release the code and data soon for further investigations.

TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training

TL;DR

This work proposes TopoCurate, an interaction-aware framework that projects multi-trial rollouts from the same task into a unified semantic quotient topology that explicitly captures how tool invocations and environmental responses drive the divergence between effective strategies and failure modes.

Abstract

Training tool-use agents typically relies on outcome-based filtering: Supervised Fine-Tuning (SFT) on successful trajectories and Reinforcement Learning (RL) on pass-rate-selected tasks. However, this paradigm ignores interaction dynamics: successful trajectories may lack error recovery or exhibit redundancy, while pass rates fail to distinguish structurally informative tasks from trivial ones. We propose \textbf{TopoCurate}, an interaction-aware framework that projects multi-trial rollouts from the same task into a unified semantic quotient topology. By merging equivalent action-observation states, this projection transforms scattered linear trajectories into a structured manifold that explicitly captures how tool invocations and environmental responses drive the divergence between effective strategies and failure modes. Leveraging this representation, we introduce a dual-selection mechanism: for SFT, we prioritize trajectories demonstrating reflective recovery, semantic efficiency, and strategic diversity to mitigate covariate shift and mode collapse; for RL, we select tasks with high error branch ratios and strategic heterogeneity, maximizing gradient Signal-to-Noise Ratio to address vanishing signals in sparse-reward settings. Evaluations on BFCLv3 and Tau2 Bench show that TopoCurate achieves consistent gains of 4.2\% (SFT) and 6.9\% (RL) over state-of-the-art baselines. We will release the code and data soon for further investigations.
Paper Structure (71 sections, 26 equations, 5 figures, 5 tables)

This paper contains 71 sections, 26 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Overview of the TopoCurate Framework. Our method operates in three systematic stages: (Left) Topological Modeling transforms disjoint rollouts into a unified state-transition graph by defining states via action-observation tuples and aggregating semantically equivalent turns; (Middle) Trajectory Selection for SFT applies three process-aware metrics—Reflective Recovery (resilience), Semantic Efficiency (economy), and Strategic Diversity (exploration)—to prioritize high-quality trajectories; (Right) Task Selection for RL evaluates tasks using Error Branch Ratio and Strategy Heterogeneity to select structurally complex tasks that maximize gradient efficiency. This unified topological view enables rigorous curation beyond simple outcome filtering.
  • Figure 2: Pass@k Comparison. We compare three settings: the base Qwen3-Instruct backbone (Thinking mode), a baseline SFT model trained on standard outcome-filtered data (w/o Topology), and our proposed TopoCurate-SFT. The results demonstrate that TopoCurate consistently push superior agentic capability boundaries across all domains and model scales.
  • Figure 3: Model Behavior Analysis. We compare the behavioral patterns of TopoCurate-SFT (Ours) vs. the baseline SFT (w/o Topology). (a) Ours demonstrates significantly higher reflective recovery rates, particularly in the complex Telecom domain. (b) In terms of the number of interaction turns (lower is better), Ours achieves higher efficiency overall, while maintaining success rate across all domains. (c) Ours consistently generates more diverse tool chains (Unique Chain Ratio), mitigating mode collapse.
  • Figure 4: Evaluation Accuracy on Tau2 for RL. (Left) Impact of SFT initialization quality on downstream RL performance across Airline, Retail, and Telecom domains. Models initialized with topological SFT exhibit consistently higher accuracy. (Right) Impact of RL task selection on final performance. Topologically-selected tasks lead to superior convergence across all domains, with the most pronounced gains in Telecom.
  • Figure 5: Training Reward Curves for RL. (Left) Comparison of models initialized with topologically-curated SFT (w/ Topology) vs. outcome-filtered SFT (w/o Topology), using the same RL task pool. (Right) Comparison of models trained on topologically-selected RL tasks (w/ Topology) vs. uniformly-sampled tasks (w/o Topology) from the same SFT checkpoint. Topological curation improves both initialization quality and task selection efficiency.