Understanding Web Application Workloads and Their Applications: Systematic Literature Review and Characterization
Roozbeh Aghili, Qiaolin Qin, Heng Li, Foutse Khomh
TL;DR
The paper tackles the gap in understanding web application workloads by conducting a systematic literature review of studies using public web workloads and by characterizing these workloads. It identifies 78 articles and 12 publicly available datasets, revealing three daily and three weekly workload patterns that are non-monotonic and best captured by polynomial models. The authors develop a complete characterization pipeline—data extraction, aggregation to daily and weekly granularity, standardization, smoothing, variability analysis, and K-Means clustering—resulting in centroid models and insights into time dependence across days and seasons. These findings inform realistic workload generation and proactive resource provisioning, and the authors advocate sharing newer datasets to reflect current web dynamics.
Abstract
Web applications, accessible via web browsers over the Internet, facilitate complex functionalities without local software installation. In the context of web applications, a workload refers to the number of user requests sent by users or applications to the underlying system. Existing studies have leveraged web application workloads to achieve various objectives, such as workload prediction and auto-scaling. However, these studies are conducted in an ad hoc manner, lacking a systematic understanding of the characteristics of web application workloads. In this study, we first conduct a systematic literature review to identify and analyze existing studies leveraging web application workloads. Our analysis sheds light on their workload utilization, analysis techniques, and high-level objectives. We further systematically analyze the characteristics of the web application workloads identified in the literature review. Our analysis centers on characterizing these workloads at two distinct temporal granularities: daily and weekly. We successfully identify and categorize three daily and three weekly patterns within the workloads. By providing a statistical characterization of these workload patterns, our study highlights the uniqueness of each pattern, paving the way for the development of realistic workload generation and resource provisioning techniques that can benefit a range of applications and research areas.
