Table of Contents
Fetching ...

The Art of Tool Interface Design

Yunnan Wu, Paul Chen, Deshank Baranwal, Jinlong Zhou, Jian Yuan

TL;DR

The paper introduces Thinker, an agent framework that achieves state-of-the-art results on challenging customer-service tasks by combining State-Machine Augmented Generation (SMAG), delegation to LLM-powered tools, and adaptive context management. SMAG encodes business logic as flows (state machines) that the LLM drives, enforcing correct sequencing and rules without sacrificing conversational flexibility. By offloading specific reasoning tasks to dedicated tools and optimizing the context fed to the model, Thinker delivers significant gains over strong prompting baselines on the τ-bench retail dataset, with substantial improvements for both GPT-4o and Llama-3.1 and notable ablations demonstrating the value of each component, especially SMAG and tool-based delegation. The results suggest that careful tool interface design and targeted context management can yield high performance in real-world, long-horizon reasoning tasks without fine-tuning, offering practical implications for scalable deployment of LLM agents in customer service.

Abstract

We present an agentic framework, Thinker, which achieves state of art performance in challenging reasoning tasks for realistic customer service scenarios that involve complex business logic and human interactions via long horizons. On the $τ$-bench retail dataset, Thinker achieves 82.6\% success rate with GPT-4o (version 2024-06-01) (baseline: 68.3\%), and 81.9\% success rate with Llama-3.1 405B (baseline: 49.6\%), without any fine-tuning. Thinker effectively closes the gap in reasoning capabilities between the base models by introducing proper structure. The key features of the Thinker framework are: (1) State-Machine Augmented Generation (SMAG), which represents business logic as state machines and the LLM uses state machines as tools. (2) Delegation of tasks from the main reasoning loop to LLM-powered tools. (3) Adaptive context management. Our prompting-only solution achieves signficant gains, while still maintaining a standard agentic architecture with a ReAct style reasoning loop. The key is to innovate on the tool interface design, as exemplified by SMAG and the LLM-powered tools.

The Art of Tool Interface Design

TL;DR

The paper introduces Thinker, an agent framework that achieves state-of-the-art results on challenging customer-service tasks by combining State-Machine Augmented Generation (SMAG), delegation to LLM-powered tools, and adaptive context management. SMAG encodes business logic as flows (state machines) that the LLM drives, enforcing correct sequencing and rules without sacrificing conversational flexibility. By offloading specific reasoning tasks to dedicated tools and optimizing the context fed to the model, Thinker delivers significant gains over strong prompting baselines on the τ-bench retail dataset, with substantial improvements for both GPT-4o and Llama-3.1 and notable ablations demonstrating the value of each component, especially SMAG and tool-based delegation. The results suggest that careful tool interface design and targeted context management can yield high performance in real-world, long-horizon reasoning tasks without fine-tuning, offering practical implications for scalable deployment of LLM agents in customer service.

Abstract

We present an agentic framework, Thinker, which achieves state of art performance in challenging reasoning tasks for realistic customer service scenarios that involve complex business logic and human interactions via long horizons. On the -bench retail dataset, Thinker achieves 82.6\% success rate with GPT-4o (version 2024-06-01) (baseline: 68.3\%), and 81.9\% success rate with Llama-3.1 405B (baseline: 49.6\%), without any fine-tuning. Thinker effectively closes the gap in reasoning capabilities between the base models by introducing proper structure. The key features of the Thinker framework are: (1) State-Machine Augmented Generation (SMAG), which represents business logic as state machines and the LLM uses state machines as tools. (2) Delegation of tasks from the main reasoning loop to LLM-powered tools. (3) Adaptive context management. Our prompting-only solution achieves signficant gains, while still maintaining a standard agentic architecture with a ReAct style reasoning loop. The key is to innovate on the tool interface design, as exemplified by SMAG and the LLM-powered tools.

Paper Structure

This paper contains 22 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Thinker: An Agentic Framework.
  • Figure 2: Illustrated example execution log with Llama-3.