Table of Contents
Fetching ...

LLMSched: Uncertainty-Aware Workload Scheduling for Compound LLM Applications

Botao Zhu, Chen Chen, Xiaoyi Fan, Yifei Zhu

TL;DR

This work addresses the scheduling of compound LLM applications, where uncertainty in both duration and structure challenges traditional DAG-based schedulers. It introduces LLMSched, a four-part framework combining a DAG-based depiction, a Bayesian-network profiler, an entropy-based uncertainty quantifier, and an uncertainty-aware scheduler that blends exploration with JCT efficiency via an $\epsilon$-greedy strategy. By identifying uncertainty-reducing stages and leveraging inter-stage correlations, LLMSched significantly reduces average JCT across simulations and a real testbed, outperforming state-of-the-art policies by up to $79\%$. The approach enables more reliable and scalable serving of complex LLM-driven pipelines, with practical implications for cloud providers and end-user experience. The results establish a principled pathway to manage both temporal and structural uncertainty in evolving compound LLM workloads.

Abstract

Developing compound Large Language Model (LLM) applications is becoming an increasingly prevalent approach to solving real-world problems. In these applications, an LLM collaborates with various external modules, including APIs and even other LLMs, to realize complex intelligent services. However, we reveal that the intrinsic duration and structural uncertainty in compound LLM applications pose great challenges for LLM service providers in serving and scheduling them efficiently. In this paper, we propose LLMSched, an uncertainty-aware scheduling framework for emerging compound LLM applications. In LLMSched, we first design a novel DAG-based model to describe the uncertain compound LLM applications. Then, we adopt the Bayesian network to comprehensively profile compound LLM applications and identify uncertainty-reducing stages, along with an entropy-based mechanism to quantify their uncertainty reduction. Combining an uncertainty reduction strategy and a job completion time (JCT)-efficient scheme, we further propose an efficient scheduler to reduce the average JCT. Evaluation of both simulation and testbed experiments on various representative compound LLM applications shows that compared to existing state-of-the-art scheduling schemes, LLMSched can reduce the average JCT by 14~79%.

LLMSched: Uncertainty-Aware Workload Scheduling for Compound LLM Applications

TL;DR

This work addresses the scheduling of compound LLM applications, where uncertainty in both duration and structure challenges traditional DAG-based schedulers. It introduces LLMSched, a four-part framework combining a DAG-based depiction, a Bayesian-network profiler, an entropy-based uncertainty quantifier, and an uncertainty-aware scheduler that blends exploration with JCT efficiency via an -greedy strategy. By identifying uncertainty-reducing stages and leveraging inter-stage correlations, LLMSched significantly reduces average JCT across simulations and a real testbed, outperforming state-of-the-art policies by up to . The approach enables more reliable and scalable serving of complex LLM-driven pipelines, with practical implications for cloud providers and end-user experience. The results establish a principled pathway to manage both temporal and structural uncertainty in evolving compound LLM workloads.

Abstract

Developing compound Large Language Model (LLM) applications is becoming an increasingly prevalent approach to solving real-world problems. In these applications, an LLM collaborates with various external modules, including APIs and even other LLMs, to realize complex intelligent services. However, we reveal that the intrinsic duration and structural uncertainty in compound LLM applications pose great challenges for LLM service providers in serving and scheduling them efficiently. In this paper, we propose LLMSched, an uncertainty-aware scheduling framework for emerging compound LLM applications. In LLMSched, we first design a novel DAG-based model to describe the uncertain compound LLM applications. Then, we adopt the Bayesian network to comprehensively profile compound LLM applications and identify uncertainty-reducing stages, along with an entropy-based mechanism to quantify their uncertainty reduction. Combining an uncertainty reduction strategy and a job completion time (JCT)-efficient scheme, we further propose an efficient scheduler to reduce the average JCT. Evaluation of both simulation and testbed experiments on various representative compound LLM applications shows that compared to existing state-of-the-art scheduling schemes, LLMSched can reduce the average JCT by 14~79%.

Paper Structure

This paper contains 23 sections, 5 equations, 10 figures, 1 table, 1 algorithm.

Figures (10)

  • Figure 1: Runtime characteristics of three representative compound LLM applications: (a) sequence sorting; (b) code generation; (c) task automation.
  • Figure 2: An example of the benefits of considering uncertainty for scheduling compound LLM applications.
  • Figure 3: LLMSched Overview
  • Figure 4: LLM DAG representations for sequence sorting, code generation, and task automation.
  • Figure 5: Heatmap for the duration of the stages in two compound LLM applications. The axis records the stage IDs sorted in a topological order with respect to the DAG in Fig. \ref{['dagexample']}.
  • ...and 5 more figures