Table of Contents
Fetching ...

Is Large Language Model Good at Database Knob Tuning? A Comprehensive Experimental Evaluation

Yiyan Li, Haoyang Li, Zhao Pu, Jing Zhang, Xinyi Zhang, Tao Ji, Luming Sun, Cuiping Li, Hong Chen

TL;DR

This work demonstrates that large language models can effectively substitute for several components of database knob tuning, treating knob pruning, model initialization, and knob recommendation as prompt-driven tasks. Across extensive experiments with OLTP SYSBENCH on MySQL and broader generalization tests to OLAP workloads, engines, and hardware, LLMs achieve performance on par with or exceeding traditional methods while offering interpretable, chain-of-thought explanations. The results show notable generalizability without retraining, and several open questions point to future directions such as fine-tuning, retrieval-augmented prompts, and end-to-end LLM-based tuning pipelines. Overall, the study highlights a promising avenue for AI-driven database management tasks beyond tuning, including query optimization and index recommendation.

Abstract

Knob tuning plays a crucial role in optimizing databases by adjusting knobs to enhance database performance. However, traditional tuning methods often follow a Try-Collect-Adjust approach, proving inefficient and database-specific. Moreover, these methods are often opaque, making it challenging for DBAs to grasp the underlying decision-making process. The emergence of large language models (LLMs) like GPT-4 and Claude-3 has excelled in complex natural language tasks, yet their potential in database knob tuning remains largely unexplored. This study harnesses LLMs as experienced DBAs for knob-tuning tasks with carefully designed prompts. We identify three key subtasks in the tuning system: knob pruning, model initialization, and knob recommendation, proposing LLM-driven solutions to replace conventional methods for each subtask. We conduct extensive experiments to compare LLM-driven approaches against traditional methods across the subtasks to evaluate LLMs' efficacy in the knob tuning domain. Furthermore, we explore the adaptability of LLM-based solutions in diverse evaluation settings, encompassing new benchmarks, database engines, and hardware environments. Our findings reveal that LLMs not only match or surpass traditional methods but also exhibit notable interpretability by generating responses in a coherent ``chain-of-thought'' manner. We further observe that LLMs exhibit remarkable generalizability through simple adjustments in prompts, eliminating the necessity for additional training or extensive code modifications. Drawing insights from our experimental findings, we identify several opportunities for future research aimed at advancing the utilization of LLMs in the realm of database management.

Is Large Language Model Good at Database Knob Tuning? A Comprehensive Experimental Evaluation

TL;DR

This work demonstrates that large language models can effectively substitute for several components of database knob tuning, treating knob pruning, model initialization, and knob recommendation as prompt-driven tasks. Across extensive experiments with OLTP SYSBENCH on MySQL and broader generalization tests to OLAP workloads, engines, and hardware, LLMs achieve performance on par with or exceeding traditional methods while offering interpretable, chain-of-thought explanations. The results show notable generalizability without retraining, and several open questions point to future directions such as fine-tuning, retrieval-augmented prompts, and end-to-end LLM-based tuning pipelines. Overall, the study highlights a promising avenue for AI-driven database management tasks beyond tuning, including query optimization and index recommendation.

Abstract

Knob tuning plays a crucial role in optimizing databases by adjusting knobs to enhance database performance. However, traditional tuning methods often follow a Try-Collect-Adjust approach, proving inefficient and database-specific. Moreover, these methods are often opaque, making it challenging for DBAs to grasp the underlying decision-making process. The emergence of large language models (LLMs) like GPT-4 and Claude-3 has excelled in complex natural language tasks, yet their potential in database knob tuning remains largely unexplored. This study harnesses LLMs as experienced DBAs for knob-tuning tasks with carefully designed prompts. We identify three key subtasks in the tuning system: knob pruning, model initialization, and knob recommendation, proposing LLM-driven solutions to replace conventional methods for each subtask. We conduct extensive experiments to compare LLM-driven approaches against traditional methods across the subtasks to evaluate LLMs' efficacy in the knob tuning domain. Furthermore, we explore the adaptability of LLM-based solutions in diverse evaluation settings, encompassing new benchmarks, database engines, and hardware environments. Our findings reveal that LLMs not only match or surpass traditional methods but also exhibit notable interpretability by generating responses in a coherent ``chain-of-thought'' manner. We further observe that LLMs exhibit remarkable generalizability through simple adjustments in prompts, eliminating the necessity for additional training or extensive code modifications. Drawing insights from our experimental findings, we identify several opportunities for future research aimed at advancing the utilization of LLMs in the realm of database management.
Paper Structure (33 sections, 5 equations, 6 figures, 6 tables)

This paper contains 33 sections, 5 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Overview of a knob tuning system. "Knob Rec." stands for "Knob Recommendation". Knob pruning and model initialization serve as optional components within the system, designed to expedite the tuning process of the knob recommendation methods.
  • Figure 2: The prompt to perform knob selection task.
  • Figure 3: The prompt to perform the model initialization and knob recommendation tasks.
  • Figure 4: Best database performance over iterations. The horizontal axis represents the number of tuning iterations and the vertical axis represents the best TPS achieved (upper-left better). Different knob pruning methods result in different convergence speeds and optimal performance.
  • Figure 5: Illustration of tuning suggestions offered by LLMs.
  • ...and 1 more figures