Table of Contents
Fetching ...

Indexing Join Inputs for Fast Queries and Maintenance

Wenhui Lyu, Goetz Graefe

TL;DR

The paper tackles the long-standing tension between fast query processing and low-maintenance intermediate storage for joins. It proposes the merged index, a proxy-store that interleaves two single-table indexes to support all join types while enabling non-blocking query processing and efficient incremental maintenance, and demonstrates it can be implemented on $B$-trees or log-structured merge-forests. Through extensive experiments on a binary join using TPC-C data, the merged index achieves query performance on par with materialized join views and maintenance efficiency comparable to traditional indexes, with only modest space overhead and natural compression of join results. The findings suggest the merged index is a practical, industrially viable storage primitive that combines the strengths of traditional indexes and materialized views for scalable, non-blocking database query processing.

Abstract

In database systems, joins are often expensive despite many years of research producing numerous join algorithms. Precomputed and materialized join views deliver the best query performance, whereas traditional indexes, used as pre-sorted inputs for merge joins, permit very efficient maintenance. Neither traditional indexes nor materialized join views require blocking phases, in contrast to query-time sorting and transient indexes, e.g., hash tables in hash joins, that impose high memory requirements and possibly spill to temporary storage. Here, we introduce a hybrid of traditional indexing and materialized join views. The *merged index* can be implemented with traditional b-trees, permits high-bandwidth maintenance using log-structured merge-forests, supports all join types (inner joins, all outer joins, all semi joins), and enables non-blocking query processing. Experiments across a wide range of scenarios confirm its query performance comparable to materialized join views and maintenance efficiency comparable to traditional indexes.

Indexing Join Inputs for Fast Queries and Maintenance

TL;DR

The paper tackles the long-standing tension between fast query processing and low-maintenance intermediate storage for joins. It proposes the merged index, a proxy-store that interleaves two single-table indexes to support all join types while enabling non-blocking query processing and efficient incremental maintenance, and demonstrates it can be implemented on -trees or log-structured merge-forests. Through extensive experiments on a binary join using TPC-C data, the merged index achieves query performance on par with materialized join views and maintenance efficiency comparable to traditional indexes, with only modest space overhead and natural compression of join results. The findings suggest the merged index is a practical, industrially viable storage primitive that combines the strengths of traditional indexes and materialized views for scalable, non-blocking database query processing.

Abstract

In database systems, joins are often expensive despite many years of research producing numerous join algorithms. Precomputed and materialized join views deliver the best query performance, whereas traditional indexes, used as pre-sorted inputs for merge joins, permit very efficient maintenance. Neither traditional indexes nor materialized join views require blocking phases, in contrast to query-time sorting and transient indexes, e.g., hash tables in hash joins, that impose high memory requirements and possibly spill to temporary storage. Here, we introduce a hybrid of traditional indexing and materialized join views. The *merged index* can be implemented with traditional b-trees, permits high-bandwidth maintenance using log-structured merge-forests, supports all join types (inner joins, all outer joins, all semi joins), and enables non-blocking query processing. Experiments across a wide range of scenarios confirm its query performance comparable to materialized join views and maintenance efficiency comparable to traditional indexes.

Paper Structure

This paper contains 34 sections, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Example: A join operator with additional query processing steps.
  • Figure 2: Query performance of merged index, traditional indexes, and materialized join views, reported as query throughput ratios.
  • Figure 3: CPU utilization of join queries for three approaches
  • Figure 4: Maintenance efficiency of merged index, traditional indexes, and materialized join view, reported as update throughput ratios. The merged index roughly matches traditional indexes and significantly outperforms materialized join views.
  • Figure 5: Loading time of the full database in log-structured merge-forests. Blue solid bars display the loading time of core storage structures, while the hatched bars show the loading time of other tables in the database. Records of two tables are loaded into a merged index without sorting first, simulating index creation. A merged index only loads slightly slower than equivalent single-table indexes.
  • ...and 3 more figures