Table of Contents
Fetching ...

A Unified Architecture for Efficient Binary and Worst-Case Optimal Join Processing

Amirali Kaboli, Alex Mascolo, Amir Shaikhha

TL;DR

This work tackles the inefficiency of traditional join processing in queries with large intermediate results by introducing a unified architecture that supports both binary joins and worst-case optimal joins (WCOJ) through SDQL-generated C++ code. It combines a planning transformation to Free Join plans, SDQL program generation with targeted optimizations, and efficient C++ code emission, enabling both hash-based and sort-based WCOJ execution. Empirical results show consistent performance gains over state-of-the-art methods (up to $3.1\times$ vs Generic Join and $4.8\times$ vs Free Join) across JOB and LSQB benchmarks, with significant improvements from optimizations like dictionary specialization, early projection/aggregation, and a hybrid hashing-sort approach. The solution offers practical impact by providing a flexible, high-performance platform that can adapt to data characteristics and query patterns, while pointing to future enhancements in lazy evaluation, parallelism, and an optimizer tuned to this unified framework.

Abstract

Join processing is a fundamental operation in database management systems; however, traditional join algorithms often encounter efficiency challenges when dealing with complex queries that produce intermediate results much larger than the final query output. The emergence of worst-case optimal join (WCOJ) algorithms represents a significant advancement, offering asymptotically better performance by avoiding the enumeration of potentially exploding intermediate results. In this paper, we propose a unified architecture that efficiently supports both traditional binary joins and WCOJ processing. As opposed to the state-of-the-art, which only focuses on either hash-based or sort-based join implementations, our system accommodates both physical implementations of binary joins and WCOJ algorithms. Experimental evaluations demonstrate that our system achieves performance gains of up to 3.1x (on average 1.5x) and 4.8x (on average 1.4x) over the state-of-the-art implementation of Generic Join and Free Join methods, respectively, across acyclic and cyclic queries in standard query benchmarks.

A Unified Architecture for Efficient Binary and Worst-Case Optimal Join Processing

TL;DR

This work tackles the inefficiency of traditional join processing in queries with large intermediate results by introducing a unified architecture that supports both binary joins and worst-case optimal joins (WCOJ) through SDQL-generated C++ code. It combines a planning transformation to Free Join plans, SDQL program generation with targeted optimizations, and efficient C++ code emission, enabling both hash-based and sort-based WCOJ execution. Empirical results show consistent performance gains over state-of-the-art methods (up to vs Generic Join and vs Free Join) across JOB and LSQB benchmarks, with significant improvements from optimizations like dictionary specialization, early projection/aggregation, and a hybrid hashing-sort approach. The solution offers practical impact by providing a flexible, high-performance platform that can adapt to data characteristics and query patterns, while pointing to future enhancements in lazy evaluation, parallelism, and an optimizer tuned to this unified framework.

Abstract

Join processing is a fundamental operation in database management systems; however, traditional join algorithms often encounter efficiency challenges when dealing with complex queries that produce intermediate results much larger than the final query output. The emergence of worst-case optimal join (WCOJ) algorithms represents a significant advancement, offering asymptotically better performance by avoiding the enumeration of potentially exploding intermediate results. In this paper, we propose a unified architecture that efficiently supports both traditional binary joins and WCOJ processing. As opposed to the state-of-the-art, which only focuses on either hash-based or sort-based join implementations, our system accommodates both physical implementations of binary joins and WCOJ algorithms. Experimental evaluations demonstrate that our system achieves performance gains of up to 3.1x (on average 1.5x) and 4.8x (on average 1.4x) over the state-of-the-art implementation of Generic Join and Free Join methods, respectively, across acyclic and cyclic queries in standard query benchmarks.

Paper Structure

This paper contains 26 sections, 6 equations, 27 figures.

Figures (27)

  • Figure 1: SDQL.
  • Figure 2: C++.
  • Figure 4: An overview of our system architecture.
  • Figure 5: SDQL.
  • Figure 6: C++.
  • ...and 22 more figures