Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

Chaofan Lin; Zhenhua Han; Chengruidong Zhang; Yuqing Yang; Fan Yang; Chen Chen; Lili Qiu

Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

Chaofan Lin, Zhenhua Han, Chengruidong Zhang, Yuqing Yang, Fan Yang, Chen Chen, Lili Qiu

TL;DR

<3-5 sentence high-level summary> Parrot tackles the inefficiency of public LLM services by exposing application-level information through Semantic Variables, enabling end-to-end optimization of LLM-based workflows. It introduces DAG-based inter-request analysis and prompt-structure analysis to discover dependencies and common prefixes across requests, enabling dependent-request batching, performance objective deduction, and prompt sharing. A novel GPU kernel and a universal engine abstraction further accelerate shared-context attention and flexible engine integration. Extensive evaluations across long-document analytics, chat, and multi-agent tasks show up to 11.7x end-to-end speedups and significant throughput improvements, highlighting the practical impact for scalable, multi-tenant LLM applications.

Abstract

The rise of large language models (LLMs) has enabled LLM-based applications (a.k.a. AI agents or co-pilots), a new software paradigm that combines the strength of LLM and conventional software. Diverse LLM applications from different tenants could design complex workflows using multiple LLM requests to accomplish one task. However, they have to use the over-simplified request-level API provided by today's public LLM services, losing essential application-level information. Public LLM services have to blindly optimize individual LLM requests, leading to sub-optimal end-to-end performance of LLM applications. This paper introduces Parrot, an LLM service system that focuses on the end-to-end experience of LLM-based applications. Parrot proposes Semantic Variable, a unified abstraction to expose application-level knowledge to public LLM services. A Semantic Variable annotates an input/output variable in the prompt of a request, and creates the data pipeline when connecting multiple LLM requests, providing a natural way to program LLM applications. Exposing Semantic Variables to the public LLM service allows it to perform conventional data flow analysis to uncover the correlation across multiple LLM requests. This correlation opens a brand-new optimization space for the end-to-end performance of LLM-based applications. Extensive evaluations demonstrate that Parrot can achieve up to an order-of-magnitude improvement for popular and practical use cases of LLM applications.

Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

TL;DR

Abstract

Paper Structure (42 sections, 1 equation, 19 figures, 2 tables, 1 algorithm)

This paper contains 42 sections, 1 equation, 19 figures, 2 tables, 1 algorithm.

Introduction
Background
LLM Service.
LLM-based Applications.
Problems of Serving LLM Applications
Excessive Overhead of Consecutive Requests.
Misaligned Scheduling Objectives.
Redundant Computations.
Parrot Design
Semantic Variable
Primitives of Inter-Request Analysis
DAG-based analysis.
Prompt structure-based analysis.
Optimizations with Semantic Variable
Serving Dependent Requests
...and 27 more sections

Figures (19)

Figure 1: The workflow of popular LLM-based applications. The final result requires multiple LLM requests.
Figure 2: The communication of consecutive LLM requests in multi-agent applications.
Figure 3: The end-to-end latency breakdown of current LLM services. The source of the overhead comes from network and queuing due to chatty interaction between LLM application and LLM services, which is eliminated in our system Parrot.
Figure 4: Request-centric scheduling v.s. application-centric scheduling for the map-reduce style document summary task.
Figure 5: The prompt structure of Bing Copilot shows a long prompt reused by different user queries.
...and 14 more figures

Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

TL;DR

Abstract

Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

Authors

TL;DR

Abstract

Table of Contents

Figures (19)