Table of Contents
Fetching ...

Detecting Logic Bugs of Join Optimizations in DBMS

Xiu Tang, Sai Wu, Dongxiang Zhang, Feifei Li, Gang Chen

TL;DR

Experimental results show that TQS is effective in finding logic bugs of join optimization in database management systems, and successfully detected 115 bugs within 24 hours.

Abstract

Generation-based testing techniques have shown their effectiveness in detecting logic bugs of DBMS, which are often caused by improper implementation of query optimizers. Nonetheless, existing generation-based debug tools are limited to single-table queries and there is a substantial research gap regarding multi-table queries with join operators. In this paper, we propose TQS, a novel testing framework targeted at detecting logic bugs derived by queries involving multi-table joins. Given a target DBMS, TQS achieves the goal with two key components: Data-guided Schema and Query Generation (DSG) and Knowledge-guided Query Space Exploration (KQE). DSG addresses the key challenge of multi-table query debugging: how to generate ground-truth (query, result) pairs for verification. It adopts the database normalization technique to generate a testing schema and maintains a bitmap index for result tracking. To improve debug efficiency, DSG also artificially inserts some noises into the generated data. To avoid repetitive query space search, KQE forms the problem as isomorphic graph set discovery and combines the graph embedding and weighted random walk for query generation. We evaluated TQS on four popular DBMSs: MySQL, MariaDB, TiDB and the gray release of an industry-leading cloud-native database, anonymized as X-DB. Experimental results show that TQS is effective in finding logic bugs of join optimization in database management systems. It successfully detected 115 bugs within 24 hours, including 31 bugs in MySQL, 30 in MariaDB, 31 in TiDB, and 23 in X-DB respectively.

Detecting Logic Bugs of Join Optimizations in DBMS

TL;DR

Experimental results show that TQS is effective in finding logic bugs of join optimization in database management systems, and successfully detected 115 bugs within 24 hours.

Abstract

Generation-based testing techniques have shown their effectiveness in detecting logic bugs of DBMS, which are often caused by improper implementation of query optimizers. Nonetheless, existing generation-based debug tools are limited to single-table queries and there is a substantial research gap regarding multi-table queries with join operators. In this paper, we propose TQS, a novel testing framework targeted at detecting logic bugs derived by queries involving multi-table joins. Given a target DBMS, TQS achieves the goal with two key components: Data-guided Schema and Query Generation (DSG) and Knowledge-guided Query Space Exploration (KQE). DSG addresses the key challenge of multi-table query debugging: how to generate ground-truth (query, result) pairs for verification. It adopts the database normalization technique to generate a testing schema and maintains a bitmap index for result tracking. To improve debug efficiency, DSG also artificially inserts some noises into the generated data. To avoid repetitive query space search, KQE forms the problem as isomorphic graph set discovery and combines the graph embedding and weighted random walk for query generation. We evaluated TQS on four popular DBMSs: MySQL, MariaDB, TiDB and the gray release of an industry-leading cloud-native database, anonymized as X-DB. Experimental results show that TQS is effective in finding logic bugs of join optimization in database management systems. It successfully detected 115 bugs within 24 hours, including 31 bugs in MySQL, 30 in MariaDB, 31 in TiDB, and 23 in X-DB respectively.
Paper Structure (20 sections, 6 equations, 10 figures, 5 tables, 2 algorithms)

This paper contains 20 sections, 6 equations, 10 figures, 5 tables, 2 algorithms.

Figures (10)

  • Figure 1: Logic bug cases of join optimizations in MySQL.
  • Figure 2: Overview of TQS. TQS designs DSG (Data-guided Schema and query Generation) and KQE (Knowledge-guided Query space Exploration) to detect the logic bugs of join optimizations in DBMS.
  • Figure 3: Schema generation of the shopping order dataset. Data in black is the original dataset, and data in color is the noisy data which is injected in schema tables and then synchronized in the wide table.
  • Figure 4: RowID map table $T_{RowIDMap}$ and the join bitmap index are built to retrieve the ground-truth of query joins. Data in color represents data updates after noise injection.
  • Figure 5: Example of join query generation. The join expressions are generated by random walk on the schema graph.
  • ...and 5 more figures

Theorems & Definitions (3)

  • definition 1
  • definition 2
  • definition 3