Table of Contents
Fetching ...

The Tool Illusion: Rethinking Tool Use in Web Agents

Renze Lou, Baolin Peng, Wenlin Yao, Qianhui Wu, Hao Cheng, Suman Nath, Wenpeng Yin, Jianfeng Gao

Abstract

As web agents rapidly evolve, an increasing body of work has moved beyond conventional atomic browser interactions and explored tool use as a higher-level action paradigm. Although prior studies have shown the promise of tools, their conclusions are often drawn from limited experimental scales and sometimes non-comparable settings. As a result, several fundamental questions remain unclear: i) whether tools provide consistent gains for web agents, ii) what practical design principles characterize effective tools, and iii) what side effects tool use may introduce. To establish a stronger empirical foundation for future research, we revisit tool use in web agents through an extensive and carefully controlled study across diverse tool sources, backbone models, tool-use frameworks, and evaluation benchmarks. Our findings both revise some prior conclusions and complement others with broader evidence. We hope this study provides a more reliable empirical basis and inspires future research on tool-use web agents.

The Tool Illusion: Rethinking Tool Use in Web Agents

Abstract

As web agents rapidly evolve, an increasing body of work has moved beyond conventional atomic browser interactions and explored tool use as a higher-level action paradigm. Although prior studies have shown the promise of tools, their conclusions are often drawn from limited experimental scales and sometimes non-comparable settings. As a result, several fundamental questions remain unclear: i) whether tools provide consistent gains for web agents, ii) what practical design principles characterize effective tools, and iii) what side effects tool use may introduce. To establish a stronger empirical foundation for future research, we revisit tool use in web agents through an extensive and carefully controlled study across diverse tool sources, backbone models, tool-use frameworks, and evaluation benchmarks. Our findings both revise some prior conclusions and complement others with broader evidence. We hope this study provides a more reliable empirical basis and inspires future research on tool-use web agents.

Paper Structure

This paper contains 24 sections, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Distribution of tool complexity levels across the three frameworks.
  • Figure 2: Distribution of tool invocations. "X invocations" indicates that the tool is invoked in X tasks of WebArena, e.g., "0 invocations" means the tools are unused across all tasks.
  • Figure 3: Average token cost per website on WebArena.
  • Figure 4: Average number of agent steps required to complete a task on WebArena.
  • Figure 5: Comparison of tool and skill in SkillWeaver.
  • ...and 4 more figures