ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Intrinsic Adaptation

Jingqi Zhou; Sheng Wang; DeZhao Deng; Junwen Lu; Junwei Su; Qintong Li; Jiahui Gao; Hao Wu; Jiyue Jiang; Lingpeng Kong; Chuan Wu

ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Intrinsic Adaptation

Jingqi Zhou, Sheng Wang, DeZhao Deng, Junwen Lu, Junwei Su, Qintong Li, Jiahui Gao, Hao Wu, Jiyue Jiang, Lingpeng Kong, Chuan Wu

TL;DR

ToolSelf addresses the rigidity of static configurations in agentic systems by unifying task execution and self-reconfiguration into a single action space defined by per-stage configurations $\mathcal{C}_i=(q_i,\sigma_i,T_i,K_i)$ and a callable reconfiguration tool. It introduces Configuration-Aware Two-stage Training (CAT), combining Rejection Sampling Fine-tuning (Stage I) and Kahneman-Tversky Optimization (Stage II) to end-to-end train both the inference agent $\pi_i$ and the reconfiguration engine $\mu$ via trajectory-level credit assignment. Empirical results across FRAMES, xbench, GAIA, and SWE-bench Lite show ToolSelf rivals specialized workflows and generalizes to new tasks, achieving a 24.1% average performance gain and substantial improvements over baselines at multiple scales. The work demonstrates a practical path toward self-adaptive, end-to-end trainable agentic systems that autonomously manage macro-goals, toolbox selection, context, and sub-goals during long-horizon reasoning.

Abstract

Agentic systems powered by Large Language Models (LLMs) have demonstrated remarkable potential in tackling complex, long-horizon tasks. However, their efficacy is fundamentally constrained by static configurations governing agent behaviors, which are fixed prior to execution and fail to adapt to evolving task dynamics. Existing approaches, relying on manual orchestration or heuristic-based patches, often struggle with poor generalization and fragmented optimization. To transcend these limitations, we propose ToolSelf, a novel paradigm enabling tool-driven runtime self-reconfiguration. By abstracting configuration updates as a callable tool, ToolSelf unifies task execution and self-adjustment into a single action space, achieving a phase transition from external rules to intrinsic parameters. Agents can thereby autonomously update their sub-goals and context based on task progression, and correspondingly adapt their strategy and toolbox, transforming from passive executors into dual managers of both task and self. We further devise Configuration-Aware Two-stage Training (CAT), combining rejection sampling fine-tuning with trajectory-level reinforcement learning to internalize this meta-capability. Extensive experiments across diverse benchmarks demonstrate that ToolSelf rivals specialized workflows while generalizing to novel tasks, achieving a 24.1% average performance gain and illuminating a path toward truly self-adaptive agents.

ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Intrinsic Adaptation

TL;DR

ToolSelf addresses the rigidity of static configurations in agentic systems by unifying task execution and self-reconfiguration into a single action space defined by per-stage configurations

and a callable reconfiguration tool. It introduces Configuration-Aware Two-stage Training (CAT), combining Rejection Sampling Fine-tuning (Stage I) and Kahneman-Tversky Optimization (Stage II) to end-to-end train both the inference agent

and the reconfiguration engine

via trajectory-level credit assignment. Empirical results across FRAMES, xbench, GAIA, and SWE-bench Lite show ToolSelf rivals specialized workflows and generalizes to new tasks, achieving a 24.1% average performance gain and substantial improvements over baselines at multiple scales. The work demonstrates a practical path toward self-adaptive, end-to-end trainable agentic systems that autonomously manage macro-goals, toolbox selection, context, and sub-goals during long-horizon reasoning.

Abstract

Paper Structure (40 sections, 13 equations, 1 figure, 8 tables)

This paper contains 40 sections, 13 equations, 1 figure, 8 tables.

Introduction
Method
Preliminaries: Pre-Fixed Configuration Paradigm
The ToolSelf Paradigm
Configuration-Aware Two-stage Training
Design Advantages
Experiments
Experimental Setup
Main Results
Configuration-Aware Two-stage Training
Ablation Study
Related Work
Conclusion
Appendix
Additional Experimental Results
...and 25 more sections

Figures (1)

Figure 1: Illustration of ToolSelf. (a) Multi-Agent Workflows rely on manual priors with poor generalization; (b) Single-Agent Extensions apply fragmented patches via external heuristic mechanisms; (c) ToolSelf unifies task execution and self-reconfiguration into a single action space, achieving intrinsic and learnable adaptation. (d) The system operates through an inference-reconfiguration loop. In each stage $i$, the inference agent $\pi_i$ executes tasks under dynamic configuration $\mathcal{C}_i = (q_i, \sigma_i, T_i, K_i)$, comprising sub-goals, strategy, toolbox, and context. When the agent determines that reconfiguration is needed, it invokes the reconfiguration tool $T_{\text{reconfig}}$, which triggers the reconfiguration engine $\mu$ to generate an updated configuration $\mathcal{C}_{i+1}$ for the next stage. By unifying task execution and self-configuration into a single action space, ToolSelf achieves autonomous triggering (deciding when), intent-driven adaptation (specifying how), and joint optimization (learning end-to-end), thereby transforming agents from passive executors into dual managers of both task and self-configuration.

ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Intrinsic Adaptation

TL;DR

Abstract

ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Intrinsic Adaptation

Authors

TL;DR

Abstract

Table of Contents

Figures (1)