Table of Contents
Fetching ...

Achieving Tool Calling Functionality in LLMs Using Only Prompt Engineering Without Fine-Tuning

Shengtao He

TL;DR

The paper addresses enabling tool calling in LLMs without fine-tuning, addressing the resource-intensive bottleneck of existing approaches. It proposes a prompt-engineering framework composed of prompt injection to expose tool APIs in the system prompt and tool result feedback to feed tool outputs back into the model via regex-based parsing and an iterative dispatch loop. Empirical results on quantized open-source LLMs show successful tool calling across diverse tasks, with a 100% reported success rate, though Python-interpreter execution and complex knowledge-graph reasoning reveal model-specific limitations. The work demonstrates significant time/resource savings and practical viability for deploying tool-aware LLMs through prompts, and provides an open-source project for reproducibility.

Abstract

Currently, the vast majority of locally deployed open-source large language models (LLMs) and some commercial model interfaces do not support stable tool calling functionality. The existing solution involves fine-tuning LLMs, which results in significant time and computational resource consumption. This paper proposes a method that enables LLMs to achieve stable tool calling capabilities using only prompt engineering and some ingenious code design. We conducted experiments on multiple LLMs that lack tool calling capabilities across various tool calling tasks, achieving a success rate of 100%.

Achieving Tool Calling Functionality in LLMs Using Only Prompt Engineering Without Fine-Tuning

TL;DR

The paper addresses enabling tool calling in LLMs without fine-tuning, addressing the resource-intensive bottleneck of existing approaches. It proposes a prompt-engineering framework composed of prompt injection to expose tool APIs in the system prompt and tool result feedback to feed tool outputs back into the model via regex-based parsing and an iterative dispatch loop. Empirical results on quantized open-source LLMs show successful tool calling across diverse tasks, with a 100% reported success rate, though Python-interpreter execution and complex knowledge-graph reasoning reveal model-specific limitations. The work demonstrates significant time/resource savings and practical viability for deploying tool-aware LLMs through prompts, and provides an open-source project for reproducibility.

Abstract

Currently, the vast majority of locally deployed open-source large language models (LLMs) and some commercial model interfaces do not support stable tool calling functionality. The existing solution involves fine-tuning LLMs, which results in significant time and computational resource consumption. This paper proposes a method that enables LLMs to achieve stable tool calling capabilities using only prompt engineering and some ingenious code design. We conducted experiments on multiple LLMs that lack tool calling capabilities across various tool calling tasks, achieving a success rate of 100%.
Paper Structure (4 sections, 1 figure, 1 table)

This paper contains 4 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: (a) Output of gemma2-9b without using prompt engineering; (b) Output of gemma2-9b using prompt engineering