Table of Contents
Fetching ...

Why Database Manuals Are Not Enough: Efficient and Reliable Configuration Tuning for DBMSs via Code-Driven LLM Agents

Xinyi Zhang, Tiantian Chen, Zhentao Han, Zhaoyan Hong, Wei Lu, Sheng Wang, Mo Sha, Anni Wang, Shuang Liu, Yakun Zhang, Feifei Li, Xiaoyong Du

Abstract

Modern database management systems (DBMSs) expose hundreds of configuration knobs that critically influence performance. Existing automated tuning methods either adopt a data-driven paradigm, which incurs substantial overhead, or rely on manual-driven heuristics extracted from database documentation, which are often limited and overly generic. Motivated by the fact that the control logic of configuration knobs is inherently encoded in the DBMS source code, we argue that promising tuning strategies can be mined directly from the code, uncovering fine-grained insights grounded in system internals. To this end, we propose SysInsight, a code-driven database tuning system that automatically extracts fine-grained tuning knowledge from DBMS source code to accelerate and stabilize the tuning process. SysInsight combines static code analysis with LLM-based reasoning to identify knob-controlled execution paths and extract semantic tuning insights. These insights are then transformed into quantitative and verifiable tuning rules via association rule mining grounded in tuning observations. During online tuning, system diagnosis is applied to identify critical knobs, which are adjusted under the rule guidance. Evaluations demonstrate that compared to the SOTA baseline, SysInsight converges to the best configuration on average 7.11X faster while achieving a 19.9% performance improvement.

Why Database Manuals Are Not Enough: Efficient and Reliable Configuration Tuning for DBMSs via Code-Driven LLM Agents

Abstract

Modern database management systems (DBMSs) expose hundreds of configuration knobs that critically influence performance. Existing automated tuning methods either adopt a data-driven paradigm, which incurs substantial overhead, or rely on manual-driven heuristics extracted from database documentation, which are often limited and overly generic. Motivated by the fact that the control logic of configuration knobs is inherently encoded in the DBMS source code, we argue that promising tuning strategies can be mined directly from the code, uncovering fine-grained insights grounded in system internals. To this end, we propose SysInsight, a code-driven database tuning system that automatically extracts fine-grained tuning knowledge from DBMS source code to accelerate and stabilize the tuning process. SysInsight combines static code analysis with LLM-based reasoning to identify knob-controlled execution paths and extract semantic tuning insights. These insights are then transformed into quantitative and verifiable tuning rules via association rule mining grounded in tuning observations. During online tuning, system diagnosis is applied to identify critical knobs, which are adjusted under the rule guidance. Evaluations demonstrate that compared to the SOTA baseline, SysInsight converges to the best configuration on average 7.11X faster while achieving a 19.9% performance improvement.
Paper Structure (33 sections, 6 equations, 11 figures, 1 table, 1 algorithm)

This paper contains 33 sections, 6 equations, 11 figures, 1 table, 1 algorithm.

Figures (11)

  • Figure 1: Motivating Example. Manual-driven approach only provides generic information for knob spin_wait_delay (i.e., innodb_spin_wait_delay), requiring further trial-and-error. SysInsight addresses this gap by analyzing the knob-controlled functions (e.g., ut_delay, sync_array_wait_event) to derive fine-grained tuning guidance rooted in system internals.
  • Figure 2: Simplified Code Snippets Showing How Knobs Influence System Behavior: (a) innodb_log_buffer_size, implemented internally via log->buf_size, determines whether a log buffer flush is triggered. (b) innodb_spin_wait_delay, implemented internally via srv_spin_wait_delay, controls the polling frequency in the spin-lock loop.
  • Figure 3: SysInsight Workflow.
  • Figure 4: Best Performance Over Iterations. For TPC-C and SYSBENCH, top-left is better. For TPC-H, bottom-left is better.
  • Figure 5: Reliability Comparison. Left figure shows #bad configurations (low is better). Right figure shows cumulative improvement over the default (higher is better).
  • ...and 6 more figures

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4