Table of Contents
Fetching ...

SMART: Self-Aware Agent for Tool Overuse Mitigation

Cheng Qian, Emre Can Acikgoz, Hongru Wang, Xiusi Chen, Avirup Sil, Dilek Hakkani-Tür, Gokhan Tur, Heng Ji

TL;DR

Tool Overuse is a fundamental inefficiency in LLM-driven reasoning where external tools are invoked even when internal knowledge suffices. SMART introduces metacognitive reasoning to dynamically balance parametric knowledge and tool use, guided by the SMART-ER dataset for multi-domain supervision and instruction tuning of SMARTAgent. Empirical results show substantial reductions in tool usage and notable accuracy gains, with 7B-scale SMARTAgent rivaling much larger baselines and strong generalization to OOD data. This approach offers a path toward resource-efficient, capable agents that intelligently allocate computational effort between thinking and external retrieval.

Abstract

Current Large Language Model (LLM) agents demonstrate strong reasoning and tool use capabilities, but often lack self-awareness, failing to balance these approaches effectively. This imbalance leads to Tool Overuse, where models unnecessarily rely on external tools for tasks solvable with parametric knowledge, increasing computational overhead. Inspired by human metacognition, we introduce SMART (Strategic Model-Aware Reasoning with Tools), a paradigm that enhances an agent's self-awareness to optimize task handling and reduce tool overuse. To support this paradigm, we introduce SMART-ER, a dataset spanning three domains, where reasoning alternates between parametric knowledge and tool-dependent steps, with each step enriched by rationales explaining when tools are necessary. Through supervised training, we develop SMARTAgent, a family of models that dynamically balance parametric knowledge and tool use. Evaluations show that SMARTAgent reduces tool use by 24% while improving performance by over 37%, enabling 7B-scale models to match its 70B counterpart and GPT-4o. Additionally, SMARTAgent generalizes to out-of-distribution test data like GSM8K and MINTQA, maintaining accuracy with just one-fifth the tool calls. These highlight the potential of strategic tool use to enhance reasoning, mitigate overuse, and bridge the gap between model size and performance, advancing intelligent and resource-efficient agent designs.

SMART: Self-Aware Agent for Tool Overuse Mitigation

TL;DR

Tool Overuse is a fundamental inefficiency in LLM-driven reasoning where external tools are invoked even when internal knowledge suffices. SMART introduces metacognitive reasoning to dynamically balance parametric knowledge and tool use, guided by the SMART-ER dataset for multi-domain supervision and instruction tuning of SMARTAgent. Empirical results show substantial reductions in tool usage and notable accuracy gains, with 7B-scale SMARTAgent rivaling much larger baselines and strong generalization to OOD data. This approach offers a path toward resource-efficient, capable agents that intelligently allocate computational effort between thinking and external retrieval.

Abstract

Current Large Language Model (LLM) agents demonstrate strong reasoning and tool use capabilities, but often lack self-awareness, failing to balance these approaches effectively. This imbalance leads to Tool Overuse, where models unnecessarily rely on external tools for tasks solvable with parametric knowledge, increasing computational overhead. Inspired by human metacognition, we introduce SMART (Strategic Model-Aware Reasoning with Tools), a paradigm that enhances an agent's self-awareness to optimize task handling and reduce tool overuse. To support this paradigm, we introduce SMART-ER, a dataset spanning three domains, where reasoning alternates between parametric knowledge and tool-dependent steps, with each step enriched by rationales explaining when tools are necessary. Through supervised training, we develop SMARTAgent, a family of models that dynamically balance parametric knowledge and tool use. Evaluations show that SMARTAgent reduces tool use by 24% while improving performance by over 37%, enabling 7B-scale models to match its 70B counterpart and GPT-4o. Additionally, SMARTAgent generalizes to out-of-distribution test data like GSM8K and MINTQA, maintaining accuracy with just one-fifth the tool calls. These highlight the potential of strategic tool use to enhance reasoning, mitigate overuse, and bridge the gap between model size and performance, advancing intelligent and resource-efficient agent designs.

Paper Structure

This paper contains 48 sections, 3 equations, 7 figures, 11 tables.

Figures (7)

  • Figure 1: An illustration of human metacognition: The user recalls Tim Cook’s role from prior knowledge (a slow-changing fact), but uses online search to find the latest chip info (a fast-changing fact).
  • Figure 2: Statistics on Llama and Mistral's tool overuse.
  • Figure 3: Example cases on XAgent and AgentGPT's tool overuse.
  • Figure 4: Three example queries and their reasoning chains from each domain. The inherent compositionality of a query naturally divides reasoning into knowledge-driven steps and tool-reliant steps.
  • Figure 5: The data pipeline to get SMART-ER. We divide the whole pipeline into several stages for better control and quality of the generated reasoning chain.
  • ...and 2 more figures