Table of Contents
Fetching ...

E2ETune: End-to-End Knob Tuning via Fine-tuned Generative Language Model

Xinmei Huang, Haoyang Li, Jing Zhang, Xinxin Zhao, Zhiming Yao, Yiyan Li, Tieying Zhang, Jianjun Chen, Hong Chen, Cuiping Li

TL;DR

E2ETune redefines database knob tuning as an end-to-end task by fine-tuning a generative language model to predict promising knob configurations directly from workload characteristics. It introduces a novel offline data-generation pipeline (OLAP/OLTP workloads) and a cost-model to replace expensive workload executions during training data collection, enabling scalable LM fine-tuning. Through extensive experiments on ten representative benchmarks and three real-world workloads, E2ETune achieves substantial speedups over traditional tuners while delivering competitive or superior configurations, including strong cross-schema generalization. The approach demonstrates the practical potential of LM-based end-to-end tuning for cloud-like environments and is released with open artifacts to support further research.

Abstract

Database knob tuning is a significant challenge for database administrators, as it involves tuning a large number of configuration knobs with continuous or discrete values to achieve optimal database performance. Traditional methods, such as manual tuning or learning-based approaches, typically require numerous workload replays and are both time-consuming and resource-intensive. To address this challenge, we introduce E2ETune, an end-to-end knob tuner powered by a fine-tuned generative language model. The key idea is to leverage the exceptional sequence-to-sequence modeling capabilities of generative language models to capture the complex mapping between workloads (inputs) and their corresponding promising configurations (outputs). To achieve this goal, we propose a novel data generation framework to efficiently produce a large amount of training data, where each data sample consists of a workload and its promising configuration. Then, these data are used to fine-tune a generative language model, yielding an end-to-end knob tuner. This tuner offers out-of-the-box configuration recommendations for new workloads. We conduct extensive experiments to evaluate E2ETune's efficiency and effectiveness using 10 representative and 3 real-world benchmarks. Compared to state-of-the-art methods, E2ETune can identify competitive configurations in significantly less time.

E2ETune: End-to-End Knob Tuning via Fine-tuned Generative Language Model

TL;DR

E2ETune redefines database knob tuning as an end-to-end task by fine-tuning a generative language model to predict promising knob configurations directly from workload characteristics. It introduces a novel offline data-generation pipeline (OLAP/OLTP workloads) and a cost-model to replace expensive workload executions during training data collection, enabling scalable LM fine-tuning. Through extensive experiments on ten representative benchmarks and three real-world workloads, E2ETune achieves substantial speedups over traditional tuners while delivering competitive or superior configurations, including strong cross-schema generalization. The approach demonstrates the practical potential of LM-based end-to-end tuning for cloud-like environments and is released with open artifacts to support further research.

Abstract

Database knob tuning is a significant challenge for database administrators, as it involves tuning a large number of configuration knobs with continuous or discrete values to achieve optimal database performance. Traditional methods, such as manual tuning or learning-based approaches, typically require numerous workload replays and are both time-consuming and resource-intensive. To address this challenge, we introduce E2ETune, an end-to-end knob tuner powered by a fine-tuned generative language model. The key idea is to leverage the exceptional sequence-to-sequence modeling capabilities of generative language models to capture the complex mapping between workloads (inputs) and their corresponding promising configurations (outputs). To achieve this goal, we propose a novel data generation framework to efficiently produce a large amount of training data, where each data sample consists of a workload and its promising configuration. Then, these data are used to fine-tune a generative language model, yielding an end-to-end knob tuner. This tuner offers out-of-the-box configuration recommendations for new workloads. We conduct extensive experiments to evaluate E2ETune's efficiency and effectiveness using 10 representative and 3 real-world benchmarks. Compared to state-of-the-art methods, E2ETune can identify competitive configurations in significantly less time.
Paper Structure (47 sections, 2 equations, 5 figures, 2 tables)

This paper contains 47 sections, 2 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Previous knob tuning methods vs.E2ETune.
  • Figure 2: Overview of E2ETune (Refer to Section \ref{['sec:overview']} for detailed explanation).
  • Figure 3: Best performance improvement over tuning time across 10 representative benchmarks. "MP" is the abbreviation of Model Pre-training. (top-left is better)
  • Figure 4: Maximum performance improvement over tuning time on three real-world benchmarks. (top-left is better)
  • Figure 5: Ablation study of the scale of training data.