Table of Contents
Fetching ...

OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning

Pan Lu, Bowen Chen, Sheng Liu, Rahul Thapa, Joseph Boen, James Zou

TL;DR

OctoTools presents a training-free, extensible agentic framework that orchestrates diverse external tools through a structured planner-executor workflow. By introducing standardized tool cards and a task-specific toolset optimization, it enables multi-step reasoning across 16 benchmarks with substantial gains over GPT-4o and other agent frameworks. The work provides in-depth analyses of planning, tool usage, and decomposition strategies, and demonstrates robustness to weaker LLMs. Together, these contributions offer a modular, transparent path toward more capable and general AI agents for complex problem solving.

Abstract

Solving complex reasoning tasks may involve visual understanding, domain knowledge retrieval, numerical calculation, and multi-step reasoning. Existing methods augment large language models (LLMs) with external tools but are restricted to specialized domains, limited tool types, or require additional training data. In this paper, we introduce OctoTools, a training-free, user-friendly, and easily extensible open-source agentic framework designed to tackle complex reasoning across diverse domains. OctoTools introduces standardized tool cards to encapsulate tool functionality, a planner for both high-level and low-level planning, and an executor to carry out tool usage. We validate OctoTools' generality across 16 diverse tasks (including MathVista, MMLU-Pro, MedQA, and GAIA-Text), achieving substantial average accuracy gains of 9.3% over GPT-4o. Furthermore, OctoTools outperforms AutoGen, GPT-Functions and LangChain by up to 10.6% when given the same set of tools. Through comprehensive analysis and ablations, OctoTools demonstrates advantages in task planning, effective tool usage, and multi-step problem solving.

OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning

TL;DR

OctoTools presents a training-free, extensible agentic framework that orchestrates diverse external tools through a structured planner-executor workflow. By introducing standardized tool cards and a task-specific toolset optimization, it enables multi-step reasoning across 16 benchmarks with substantial gains over GPT-4o and other agent frameworks. The work provides in-depth analyses of planning, tool usage, and decomposition strategies, and demonstrates robustness to weaker LLMs. Together, these contributions offer a modular, transparent path toward more capable and general AI agents for complex problem solving.

Abstract

Solving complex reasoning tasks may involve visual understanding, domain knowledge retrieval, numerical calculation, and multi-step reasoning. Existing methods augment large language models (LLMs) with external tools but are restricted to specialized domains, limited tool types, or require additional training data. In this paper, we introduce OctoTools, a training-free, user-friendly, and easily extensible open-source agentic framework designed to tackle complex reasoning across diverse domains. OctoTools introduces standardized tool cards to encapsulate tool functionality, a planner for both high-level and low-level planning, and an executor to carry out tool usage. We validate OctoTools' generality across 16 diverse tasks (including MathVista, MMLU-Pro, MedQA, and GAIA-Text), achieving substantial average accuracy gains of 9.3% over GPT-4o. Furthermore, OctoTools outperforms AutoGen, GPT-Functions and LangChain by up to 10.6% when given the same set of tools. Through comprehensive analysis and ablations, OctoTools demonstrates advantages in task planning, effective tool usage, and multi-step problem solving.
Paper Structure (74 sections, 7 equations, 18 figures, 3 tables, 1 algorithm)

This paper contains 74 sections, 7 equations, 18 figures, 3 tables, 1 algorithm.

Figures (18)

  • Figure 1: The framework of OctoTools. (1) Tool cards define tool-usage metadata and encapsulate tools, enabling training-free integration of new tools without additional training or framework refinement. (2) The planner governs both high-level and low-level planning to address the global objective and refine actions step by step. (3) The executor instantiates tool calls by generating executable commands and save structured results in the context. The final answer is summarized from the full trajectory in the context. Furthermore, the task-specific toolset optimization algorithm learns to select a beneficial subset of tools for downstream tasks. See Figure \ref{['fig:model_example']} for an example.
  • Figure 2: Performance comparison across 16 benchmarks. Our OctoTools framework achieves an average accuracy gain of 9.3% over GPT-4o without function plugins and 7.3% over LangChain, using the same tools under the same configuration.
  • Figure 3: The demonstration of a self-contained example from Figure \ref{['fig:model_framework']}. We visualize the tool cards for selected tools, the initial plan generated by the planner, and two steps in which the planner and the executor orchestrate low-level planing and tool usage before arriving at the final answer. See §\ref{['app:demo_example']} for details and §\ref{['app:exp_examples']} for more examples. An interactive visualization of these examples is available at https://octotools.github.io/#visualization.
  • Figure 4: a. Tool usage distribution in our OctoTools framework and agent baselines by averaging results from 16 tasks. b. Tool usage distribution across 16 tasks in OctoTools. OctoTools takes advantage of different external tools to address task-specific challenges.
  • Figure 5: Benchmark distribution across average number of steps and fraction of external tool usage (tools that exclude the base tool Generalist_Solution_Generator) in OctoTools.
  • ...and 13 more figures