Table of Contents
Fetching ...

Sibyl: Forecasting Time-Evolving Query Workloads

Hanxian Huang, Tarique Siddiqui, Rana Alotaibi, Carlo Curino, Jyoti Leeka, Alekh Jindal, Jishen Zhao, Jesus Camacho-Rodriguez, Yuanyuan Tian

TL;DR

Sibyl tackles the challenge of forecasting time-evolving database workloads by forecasting exact future query statements across horizons. It introduces a template-based representation and a specialized Sibyl-LSTMs encoder-decoder to perform next-$k$ forecasting, then extends to next-$\Delta t$ forecasting through template cutting/packing into per-bin models for scalability. A feedback loop enables incremental adaptation to workload drifts, and the framework demonstrates strong accuracy and practical impact, improving view and index selection performance by substantial factors on real workloads. The work shows how precise, end-to-end workload forecasts can empower existing DBMS optimization tools to operate effectively under evolving workloads.

Abstract

Database systems often rely on historical query traces to perform workload-based performance tuning. However, real production workloads are time-evolving, making historical queries ineffective for optimizing future workloads. To address this challenge, we propose SIBYL, an end-to-end machine learning-based framework that accurately forecasts a sequence of future queries, with the entire query statements, in various prediction windows. Drawing insights from real-workloads, we propose template-based featurization techniques and develop a stacked-LSTM with an encoder-decoder architecture for accurate forecasting of query workloads. We also develop techniques to improve forecasting accuracy over large prediction windows and achieve high scalability over large workloads with high variability in arrival rates of queries. Finally, we propose techniques to handle workload drifts. Our evaluation on four real workloads demonstrates that SIBYL can forecast workloads with an $87.3\%$ median F1 score, and can result in $1.7\times$ and $1.3\times$ performance improvement when applied to materialized view selection and index selection applications, respectively.

Sibyl: Forecasting Time-Evolving Query Workloads

TL;DR

Sibyl tackles the challenge of forecasting time-evolving database workloads by forecasting exact future query statements across horizons. It introduces a template-based representation and a specialized Sibyl-LSTMs encoder-decoder to perform next- forecasting, then extends to next- forecasting through template cutting/packing into per-bin models for scalability. A feedback loop enables incremental adaptation to workload drifts, and the framework demonstrates strong accuracy and practical impact, improving view and index selection performance by substantial factors on real workloads. The work shows how precise, end-to-end workload forecasts can empower existing DBMS optimization tools to operate effectively under evolving workloads.

Abstract

Database systems often rely on historical query traces to perform workload-based performance tuning. However, real production workloads are time-evolving, making historical queries ineffective for optimizing future workloads. To address this challenge, we propose SIBYL, an end-to-end machine learning-based framework that accurately forecasts a sequence of future queries, with the entire query statements, in various prediction windows. Drawing insights from real-workloads, we propose template-based featurization techniques and develop a stacked-LSTM with an encoder-decoder architecture for accurate forecasting of query workloads. We also develop techniques to improve forecasting accuracy over large prediction windows and achieve high scalability over large workloads with high variability in arrival rates of queries. Finally, we propose techniques to handle workload drifts. Our evaluation on four real workloads demonstrates that SIBYL can forecast workloads with an median F1 score, and can result in and performance improvement when applied to materialized view selection and index selection applications, respectively.
Paper Structure (38 sections, 5 equations, 13 figures, 11 tables, 1 algorithm)

This paper contains 38 sections, 5 equations, 13 figures, 11 tables, 1 algorithm.

Figures (13)

  • Figure 1: An example of parameterized query.
  • Figure 2: Characterizing time-evolving patterns and their predictability using queries from the Telemetry workload.
  • Figure 3: (a) The histogram of ApEn on parameters of Telemetry workload. (b) The negative correlation between parameter forecasting accuracy (using vanilla LSTM) and ApEn.
  • Figure 4: Sibyl Overview. (a) shows the four components of Sibyl for next-$\Delta t$ forecasting. (b) shows the three phases of Sibyl.
  • Figure 5: (a) One LSTM layer. (b) Sibyl-LSTMs.
  • ...and 8 more figures

Theorems & Definitions (2)

  • Definition 3.1
  • Definition 3.2