Efficient Function Orchestration for Large Language Models
Xiaoxia Liu, Peng Di, Cong Li, Jun Sun, Jingyi Wang
TL;DR
Efficient Function Orchestration for Large Language Models addresses the inefficiency of sequential function calling by introducing LLMOrch, a framework that leverages def-use data-relations and mutual-exclusion control-relations to orchestrate parallel function calls. It decouples query translation, relation discovery, and execution scheduling, enabling processor-aware, fault-tolerant parallelism for IO- and compute-intensive tasks. Across diverse benchmarks and real-world scenarios, LLMOrch achieves competitive or superior latency speedups while reducing token costs and maintaining accuracy, with performance scaling roughly linearly with the number of processors. By presenting a principled FRG-based approach and explicit error-recovery mechanisms, the work offers a practical, open-source solution for reliable, scalable LLM-driven software orchestration.
Abstract
Function calling is a fundamental capability of today's large language models, but sequential function calling posed efficiency problems. Recent studies have proposed to request function calls with parallelism support in order to alleviate this issue. However, they either delegate the concurrent function calls to users for execution which are conversely executed sequentially, or overlook the relations among various function calls, rending limited efficiency. This paper introduces LLMOrch, an advanced framework for automated, parallel function calling in large language models. The key principle behind LLMOrch is to identify an available processor to execute a function call while preventing any single processor from becoming overburdened. To this end, LLMOrch models the data relations (i.e., def-use) among different function calls and coordinates their executions by their control relations (i.e., mutual-exclusion) as well as the working status of the underlying processors. When comparing with state-of-the-art techniques, LLMOrch demonstrated comparable efficiency improvements in orchestrating I/O-intensive functions, while significantly outperforming (2$\times$) them with compute-intensive functions. LLMOrch's performance even showed a linear correlation to the number of allocated processors. We believe that these results highlight the potential of LLMOrch as an efficient solution for parallel function orchestration in the context of large language models.
