Table of Contents
Fetching ...

Efficient Function Orchestration for Large Language Models

Xiaoxia Liu, Peng Di, Cong Li, Jun Sun, Jingyi Wang

TL;DR

Efficient Function Orchestration for Large Language Models addresses the inefficiency of sequential function calling by introducing LLMOrch, a framework that leverages def-use data-relations and mutual-exclusion control-relations to orchestrate parallel function calls. It decouples query translation, relation discovery, and execution scheduling, enabling processor-aware, fault-tolerant parallelism for IO- and compute-intensive tasks. Across diverse benchmarks and real-world scenarios, LLMOrch achieves competitive or superior latency speedups while reducing token costs and maintaining accuracy, with performance scaling roughly linearly with the number of processors. By presenting a principled FRG-based approach and explicit error-recovery mechanisms, the work offers a practical, open-source solution for reliable, scalable LLM-driven software orchestration.

Abstract

Function calling is a fundamental capability of today's large language models, but sequential function calling posed efficiency problems. Recent studies have proposed to request function calls with parallelism support in order to alleviate this issue. However, they either delegate the concurrent function calls to users for execution which are conversely executed sequentially, or overlook the relations among various function calls, rending limited efficiency. This paper introduces LLMOrch, an advanced framework for automated, parallel function calling in large language models. The key principle behind LLMOrch is to identify an available processor to execute a function call while preventing any single processor from becoming overburdened. To this end, LLMOrch models the data relations (i.e., def-use) among different function calls and coordinates their executions by their control relations (i.e., mutual-exclusion) as well as the working status of the underlying processors. When comparing with state-of-the-art techniques, LLMOrch demonstrated comparable efficiency improvements in orchestrating I/O-intensive functions, while significantly outperforming (2$\times$) them with compute-intensive functions. LLMOrch's performance even showed a linear correlation to the number of allocated processors. We believe that these results highlight the potential of LLMOrch as an efficient solution for parallel function orchestration in the context of large language models.

Efficient Function Orchestration for Large Language Models

TL;DR

Efficient Function Orchestration for Large Language Models addresses the inefficiency of sequential function calling by introducing LLMOrch, a framework that leverages def-use data-relations and mutual-exclusion control-relations to orchestrate parallel function calls. It decouples query translation, relation discovery, and execution scheduling, enabling processor-aware, fault-tolerant parallelism for IO- and compute-intensive tasks. Across diverse benchmarks and real-world scenarios, LLMOrch achieves competitive or superior latency speedups while reducing token costs and maintaining accuracy, with performance scaling roughly linearly with the number of processors. By presenting a principled FRG-based approach and explicit error-recovery mechanisms, the work offers a practical, open-source solution for reliable, scalable LLM-driven software orchestration.

Abstract

Function calling is a fundamental capability of today's large language models, but sequential function calling posed efficiency problems. Recent studies have proposed to request function calls with parallelism support in order to alleviate this issue. However, they either delegate the concurrent function calls to users for execution which are conversely executed sequentially, or overlook the relations among various function calls, rending limited efficiency. This paper introduces LLMOrch, an advanced framework for automated, parallel function calling in large language models. The key principle behind LLMOrch is to identify an available processor to execute a function call while preventing any single processor from becoming overburdened. To this end, LLMOrch models the data relations (i.e., def-use) among different function calls and coordinates their executions by their control relations (i.e., mutual-exclusion) as well as the working status of the underlying processors. When comparing with state-of-the-art techniques, LLMOrch demonstrated comparable efficiency improvements in orchestrating I/O-intensive functions, while significantly outperforming (2) them with compute-intensive functions. LLMOrch's performance even showed a linear correlation to the number of allocated processors. We believe that these results highlight the potential of LLMOrch as an efficient solution for parallel function orchestration in the context of large language models.

Paper Structure

This paper contains 12 sections, 2 equations, 6 figures, 3 tables, 4 algorithms.

Figures (6)

  • Figure 1: Overview of LLMOrch. Each node in the Function-call Relation Graph is assigned a rank, represented by a number in the top-right yellow circle, which is computed based on the def-use (data) relations. The set of function calls with the same rank are scheduled concurrently for example $s_1$/$s_2$ and $s_3$/$s_4$, though their scheduling does not immediately trigger their execution. LLMOrch manages this coordination through their mutual-exclusion (control) relations and the current work status of the underlying processors. In this example, $p_1$ and $p_2$ represent two physical processors; $s_3$ and $s_4$ are coordinated to them respectively because they are mutual-exclusive function calls.
  • Figure 2: The function call sequence of our illustrative example after query translation. The grammar is similar to those used by ReAct and LLMCompiler. Each function call is given a unique ID (e.g., ) which also serves as the result of the function call. Different function calls have explicit data-dependencies and implicit control exclusions; these decide their order for subsequent schedule and execution.
  • Figure 3: FRG for The Example in RQ2.
  • Figure 4: Latency speedups with allocated processors on KITTI$^\ast$.
  • Figure 5:
  • ...and 1 more figures