BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems

Yuxin Wang; Yuhan Chen; Zeyu Li; Xueze Kang; Yuchu Fang; Yeju Zhou; Yang Zheng; Zhenheng Tang; Xin He; Rui Guo; Xin Wang; Qiang Wang; Amelie Chi Zhou; Xiaowen Chu

BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems

Yuxin Wang, Yuhan Chen, Zeyu Li, Xueze Kang, Yuchu Fang, Yeju Zhou, Yang Zheng, Zhenheng Tang, Xin He, Rui Guo, Xin Wang, Qiang Wang, Amelie Chi Zhou, Xiaowen Chu

TL;DR

This paper presents BurstGPT, a real-world LLM serving workload derived from Azure OpenAI GPT service traces over 213 days, to address the gap in realistic evaluation of serving systems. It analyzes user concurrency, conversation patterns, model response lengths, and system failures, and introduces BurstGPT-Perf, a modular benchmark suite for scalable, trace-driven evaluation. The study demonstrates that real-world burstiness and distributional patterns challenge KV-cache management and scheduling, revealing limits of synthetic workloads and guiding workload-aware optimizations. The authors also show industry relevance through demonstrations with PD disaggregation and load provisioning, highlighting practical impacts for reducing costs and improving QoS in production LLM services.

Abstract

Serving systems for Large Language Models (LLMs) are often optimized to improve quality of service (QoS) and throughput. However, due to the lack of open-source LLM serving workloads, these systems are frequently evaluated under unrealistic workload assumptions. Consequently, performance may degrade when systems are deployed in real-world scenarios. This work presents BurstGPT, an LLM serving workload with 10.31 million traces from regional Azure OpenAI GPT services over 213 days. BurstGPT captures LLM serving characteristics from user, model and system perspectives: (1) User request concurrency: burstiness variations of requests in Azure OpenAI GPT services, revealing diversified concurrency patterns in different services and model types. (2) User conversation patterns: counts and intervals within conversations for service optimizations. (3) Model response lengths: auto-regressive serving processes of GPT models, showing statistical relations between requests and their responses. (4) System response failures: failures of conversation and API services, showing intensive resource needs and limited availability of LLM services in Azure. The details of the characteristics can serve multiple purposes in LLM serving optimizations, such as system evaluation and trace provisioning. In our demo evaluation with BurstGPT, frequent variations in BurstGPT reveal declines in efficiency, stability, or reliability in realistic LLM serving. We identify that the generalization of KV cache management, scheduling and disaggregation optimizations can be improved under realistic workload evaluations. BurstGPT is publicly available now at https://github.com/HPMLL/BurstGPT and is widely used to develop prototypes of LLM serving frameworks in the industry.

BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems

TL;DR

Abstract

Paper Structure (33 sections, 20 figures, 1 table)

This paper contains 33 sections, 20 figures, 1 table.

Introduction
Preliminary and Motivation
Limitations of LLM Serving
Towards Workload-aware LLM Serving
User Concurrency Patterns
Model Patterns
Introduction to BurstGPT
User Request Concurrency
Long-term Patterns: Periodicity and Aperiodicity
Short-term Patterns: Variant Burstiness
User Conversation Patterns
Distribution
Interval Time
The More Requests, the Longer Intervals
Model Response Patterns
...and 18 more sections

Figures (20)

Figure 1: Data collection and use method of BurstGPT. BurstGPT is a real-world workload trace from the Azure OpenAI GPT service. A scaled sample from a period of BurstGPT can be used to optimize serving systems using specific methods, considering realistic concurrency and response patterns. Note that we open-sourced two versions of BurstGPT: a cleaned trace and a raw trace, with failure logs excluded from the cleaned version.
Figure 2: Weekly Periodicity of Conversation Services in BurstGPT.
Figure 3: Weekly Aperiodicity API Services in BurstGPT.
Figure 4: Daily Periodicity Conversation Services in BurstGPT.
Figure 5: Daily Aperiodicity API Services in BurstGPT.
...and 15 more figures

BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems

TL;DR

Abstract

BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (20)