Table of Contents
Fetching ...

Less is More: Optimizing Function Calling for LLM Execution on Edge Devices

Varatheepan Paramanayakam, Andreas Karatzas, Iraklis Anagnostopoulos, Dimitrios Stamoulis

Abstract

The advanced function-calling capabilities of foundation models open up new possibilities for deploying agents to perform complex API tasks. However, managing large amounts of data and interacting with numerous APIs makes function calling hardware-intensive and costly, especially on edge devices. Current Large Language Models (LLMs) struggle with function calling at the edge because they cannot handle complex inputs or manage multiple tools effectively. This results in low task-completion accuracy, increased delays, and higher power consumption. In this work, we introduce Less-is-More, a novel fine-tuning-free function-calling scheme for dynamic tool selection. Our approach is based on the key insight that selectively reducing the number of tools available to LLMs significantly improves their function-calling performance, execution time, and power efficiency on edge devices. Experimental results with state-of-the-art LLMs on edge hardware show agentic success rate improvements, with execution time reduced by up to 70% and power consumption by up to 40%.

Less is More: Optimizing Function Calling for LLM Execution on Edge Devices

Abstract

The advanced function-calling capabilities of foundation models open up new possibilities for deploying agents to perform complex API tasks. However, managing large amounts of data and interacting with numerous APIs makes function calling hardware-intensive and costly, especially on edge devices. Current Large Language Models (LLMs) struggle with function calling at the edge because they cannot handle complex inputs or manage multiple tools effectively. This results in low task-completion accuracy, increased delays, and higher power consumption. In this work, we introduce Less-is-More, a novel fine-tuning-free function-calling scheme for dynamic tool selection. Our approach is based on the key insight that selectively reducing the number of tools available to LLMs significantly improves their function-calling performance, execution time, and power efficiency on edge devices. Experimental results with state-of-the-art LLMs on edge hardware show agentic success rate improvements, with execution time reduced by up to 70% and power consumption by up to 40%.

Paper Structure

This paper contains 8 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Less-is-More considers three distinct tool representations (Search Levels): individual tools (Level 1), tool clusters (Level 2), or the entire tool set (Level 3), whose latent spaces are constructed offline. At runtime, given a query and without providing any tools, the LLM (Recommender) generates tool descriptions "ideal" for the task. Given the LLM-recommended tool embeddings, the Controller identifies the most relevant real tools based on their proximity in the latent space representations.
  • Figure 2: Performance comparison of our method (LiS) at varying $k$ values against the default approach, measured by Success Rate, Tool Accuracy, Normalized Execution Time, and Normalized Power for the BFCL berkeley-function-calling-leaderboard benchmark.
  • Figure 3: Performance comparison of our method (LiS) at varying $k$ values against the default approach, measured by Success Rate, Tool Accuracy, Normalized Execution Time, and Normalized Power for the GeoEngine singh2024geollm benchmark.