Table of Contents
Fetching ...

GeckOpt: LLM System Efficiency via Intent-Based Tool Selection

Michael Fore, Simranjit Singh, Dimitrios Stamoulis

TL;DR

GeckOpt tackles the inefficiency of LLM-driven tool selection by introducing an intent-based gating stage that narrows the pool of APIs before concrete tool calls. The method relies on an offline mapping of tasks to intents and an online phase where the LLM first infers intent and gates API libraries, enabling multi-tool execution within fewer GPT steps and reducing token usage. Experiments on a GeoLLM-Engine Copilot-like platform show token reductions of up to 24.6% with only marginal performance loss, indicating substantial potential for cloud-cost savings and hardware efficiency. The approach is fully GPT-driven, resilient to incorrect initial tool selection, and framed as a step toward generalizable, resource-efficient LLM orchestration; future work includes broader benchmarks and local LLM deployment.

Abstract

In this preliminary study, we investigate a GPT-driven intent-based reasoning approach to streamline tool selection for large language models (LLMs) aimed at system efficiency. By identifying the intent behind user prompts at runtime, we narrow down the API toolset required for task execution, reducing token consumption by up to 24.6\%. Early results on a real-world, massively parallel Copilot platform with over 100 GPT-4-Turbo nodes show cost reductions and potential towards improving LLM-based system efficiency.

GeckOpt: LLM System Efficiency via Intent-Based Tool Selection

TL;DR

GeckOpt tackles the inefficiency of LLM-driven tool selection by introducing an intent-based gating stage that narrows the pool of APIs before concrete tool calls. The method relies on an offline mapping of tasks to intents and an online phase where the LLM first infers intent and gates API libraries, enabling multi-tool execution within fewer GPT steps and reducing token usage. Experiments on a GeoLLM-Engine Copilot-like platform show token reductions of up to 24.6% with only marginal performance loss, indicating substantial potential for cloud-cost savings and hardware efficiency. The approach is fully GPT-driven, resilient to incorrect initial tool selection, and framed as a step toward generalizable, resource-efficient LLM orchestration; future work includes broader benchmarks and local LLM deployment.

Abstract

In this preliminary study, we investigate a GPT-driven intent-based reasoning approach to streamline tool selection for large language models (LLMs) aimed at system efficiency. By identifying the intent behind user prompts at runtime, we narrow down the API toolset required for task execution, reducing token consumption by up to 24.6\%. Early results on a real-world, massively parallel Copilot platform with over 100 GPT-4-Turbo nodes show cost reductions and potential towards improving LLM-based system efficiency.
Paper Structure (2 sections, 2 tables)

This paper contains 2 sections, 2 tables.