ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Intrinsic Adaptation
Jingqi Zhou, Sheng Wang, DeZhao Deng, Junwen Lu, Junwei Su, Qintong Li, Jiahui Gao, Hao Wu, Jiyue Jiang, Lingpeng Kong, Chuan Wu
TL;DR
ToolSelf addresses the rigidity of static configurations in agentic systems by unifying task execution and self-reconfiguration into a single action space defined by per-stage configurations $\mathcal{C}_i=(q_i,\sigma_i,T_i,K_i)$ and a callable reconfiguration tool. It introduces Configuration-Aware Two-stage Training (CAT), combining Rejection Sampling Fine-tuning (Stage I) and Kahneman-Tversky Optimization (Stage II) to end-to-end train both the inference agent $\pi_i$ and the reconfiguration engine $\mu$ via trajectory-level credit assignment. Empirical results across FRAMES, xbench, GAIA, and SWE-bench Lite show ToolSelf rivals specialized workflows and generalizes to new tasks, achieving a 24.1% average performance gain and substantial improvements over baselines at multiple scales. The work demonstrates a practical path toward self-adaptive, end-to-end trainable agentic systems that autonomously manage macro-goals, toolbox selection, context, and sub-goals during long-horizon reasoning.
Abstract
Agentic systems powered by Large Language Models (LLMs) have demonstrated remarkable potential in tackling complex, long-horizon tasks. However, their efficacy is fundamentally constrained by static configurations governing agent behaviors, which are fixed prior to execution and fail to adapt to evolving task dynamics. Existing approaches, relying on manual orchestration or heuristic-based patches, often struggle with poor generalization and fragmented optimization. To transcend these limitations, we propose ToolSelf, a novel paradigm enabling tool-driven runtime self-reconfiguration. By abstracting configuration updates as a callable tool, ToolSelf unifies task execution and self-adjustment into a single action space, achieving a phase transition from external rules to intrinsic parameters. Agents can thereby autonomously update their sub-goals and context based on task progression, and correspondingly adapt their strategy and toolbox, transforming from passive executors into dual managers of both task and self. We further devise Configuration-Aware Two-stage Training (CAT), combining rejection sampling fine-tuning with trajectory-level reinforcement learning to internalize this meta-capability. Extensive experiments across diverse benchmarks demonstrate that ToolSelf rivals specialized workflows while generalizing to novel tasks, achieving a 24.1% average performance gain and illuminating a path toward truly self-adaptive agents.
