Table of Contents
Fetching ...

Self-Guided Function Calling in Large Language Models via Stepwise Experience Recall

Sijia Cui, Aiyao He, Shuai Xu, Hongming Zhang, Yanna Wang, Qingyang Zhang, Yajing Wang, Bo Xu

TL;DR

Self-Guided Function Calling in Large Language Models introduces SEER, a self-guided, online framework for enhancing multi-step tool use in LLMs. SEER continually builds an experience pool by extracting trajectory-based experiences, recall- ing past successful trajectories with a three-component scoring function that considers trajectory similarity, toolchain coverage, and intent alignment, and updating the pool through a self-evaluator. Empirical results on ToolQA and $\tau$-bench show SEER achieving superior average accuracy (e.g., $6.1\%$ and $4.7\%$ gains on ToolQA easy and hard, and substantial gains on $\tau$-bench with $72$B models), along with clear evidence of online self-improvement and robust ablations. The approach reduces reliance on manual demonstrations, scales with tool diversity, and has practical implications for deploying tool-augmented LLM agents in real-world domains, though it assumes a fixed retrieval weighting and faces memory-diversity limitations.

Abstract

Function calling enables large language models (LLMs) to interact with external systems by leveraging tools and APIs. When faced with multi-step tool usage, LLMs still struggle with tool selection, parameter generation, and tool-chain planning. Existing methods typically rely on manually designing task-specific demonstrations, or retrieving from a curated library. These approaches demand substantial expert effort and prompt engineering becomes increasingly complex and inefficient as tool diversity and task difficulty scale. To address these challenges, we propose a self-guided method, Stepwise Experience Recall (SEER), which performs fine-grained, stepwise retrieval from a continually updated experience pool. Instead of relying on static or manually curated library, SEER incrementally augments the experience pool with past successful trajectories, enabling continuous expansion of the pool and improved model performance over time. Evaluated on the ToolQA benchmark, SEER achieves an average improvement of 6.1% on easy and 4.7% on hard questions. We further test SEER on $τ$-bench, which includes two real-world domains. Powered by Qwen2.5-7B and Qwen2.5-72B models, SEER demonstrates substantial accuracy gains of 7.44% and 23.38%, respectively.

Self-Guided Function Calling in Large Language Models via Stepwise Experience Recall

TL;DR

Self-Guided Function Calling in Large Language Models introduces SEER, a self-guided, online framework for enhancing multi-step tool use in LLMs. SEER continually builds an experience pool by extracting trajectory-based experiences, recall- ing past successful trajectories with a three-component scoring function that considers trajectory similarity, toolchain coverage, and intent alignment, and updating the pool through a self-evaluator. Empirical results on ToolQA and -bench show SEER achieving superior average accuracy (e.g., and gains on ToolQA easy and hard, and substantial gains on -bench with B models), along with clear evidence of online self-improvement and robust ablations. The approach reduces reliance on manual demonstrations, scales with tool diversity, and has practical implications for deploying tool-augmented LLM agents in real-world domains, though it assumes a fixed retrieval weighting and faces memory-diversity limitations.

Abstract

Function calling enables large language models (LLMs) to interact with external systems by leveraging tools and APIs. When faced with multi-step tool usage, LLMs still struggle with tool selection, parameter generation, and tool-chain planning. Existing methods typically rely on manually designing task-specific demonstrations, or retrieving from a curated library. These approaches demand substantial expert effort and prompt engineering becomes increasingly complex and inefficient as tool diversity and task difficulty scale. To address these challenges, we propose a self-guided method, Stepwise Experience Recall (SEER), which performs fine-grained, stepwise retrieval from a continually updated experience pool. Instead of relying on static or manually curated library, SEER incrementally augments the experience pool with past successful trajectories, enabling continuous expansion of the pool and improved model performance over time. Evaluated on the ToolQA benchmark, SEER achieves an average improvement of 6.1% on easy and 4.7% on hard questions. We further test SEER on -bench, which includes two real-world domains. Powered by Qwen2.5-7B and Qwen2.5-72B models, SEER demonstrates substantial accuracy gains of 7.44% and 23.38%, respectively.

Paper Structure

This paper contains 22 sections, 1 equation, 6 figures, 6 tables, 2 algorithms.

Figures (6)

  • Figure 1: Overview of the SEER framework. The core component is the stepwise experience recall (left), which retrieves relevant trajectories from the experience pool based on the current interaction history $H_t$, and returns the top-$k$ examples $\mathcal{D}_{\text{recall}}$ to guide the LLM’s next decision. The continual experience accumulation mechanism (right) updates the experience pool by identifying successful trajectories using an evaluator.
  • Figure 2: The self-improvement of SEER. The red solid line represents SEER's average accuracy per batch. The blue dashed line represents a 3-point moving average.
  • Figure 3: Accuracy of SEER and its ablated variants, showing the impact of each retrieval component.
  • Figure 4: The illustration of the intent recognizer.
  • Figure 5: The illustration of the evaluator.
  • ...and 1 more figures