Table of Contents
Fetching ...

Scaling and Load-Balancing Equi-Joins

Ahmed Metwally

TL;DR

This article proposes Adaptive-Multistage-Join (AM-Join) for scalable and fast equi-joins in distributed shared-nothing architectures and proposes the Index-Broadcast-Join (IB-Join) family of algorithms for Small-Large joins, where one table fits in memory and the other can be up to orders of magnitude larger.

Abstract

The task of joining two tables is fundamental for querying databases. In this paper, we focus on the equi-join problem, where a pair of records from the two joined tables are part of the join results if equality holds between their values in the join column(s). While this is a tractable problem when the number of records in the joined tables is relatively small, it becomes very challenging as the table sizes increase, especially if hot keys (join column values with a large number of records) exist in both joined tables. This paper, an extended version of [metwally-SIGMOD-2022], proposes Adaptive-Multistage-Join (AM-Join) for scalable and fast equi-joins in distributed shared-nothing architectures. AM-Join utilizes (a) Tree-Join, a proposed novel algorithm that scales well when the joined tables share hot keys, and (b) Broadcast-Join, the known fastest when joining keys that are hot in only one table. Unlike the state-of-the-art algorithms, AM-Join (a) holistically solves the join-skew problem by achieving load balancing throughout the join execution, and (b) supports all outer-join variants without record deduplication or custom table partitioning. For the fastest AM-Join outer-join performance, we propose the Index-Broadcast-Join (IB-Join) family of algorithms for Small-Large joins, where one table fits in memory and the other can be up to orders of magnitude larger. The outer-join variants of IB-Join improves on the state-of-the-art Small-Large outer-join algorithms. The proposed algorithms can be adopted in any shared-nothing architecture. We implemented a MapReduce version using Spark. Our evaluation shows the proposed algorithms execute significantly faster and scale to more skewed and orders-of-magnitude bigger tables when compared to the state-of-the-art algorithms.

Scaling and Load-Balancing Equi-Joins

TL;DR

This article proposes Adaptive-Multistage-Join (AM-Join) for scalable and fast equi-joins in distributed shared-nothing architectures and proposes the Index-Broadcast-Join (IB-Join) family of algorithms for Small-Large joins, where one table fits in memory and the other can be up to orders of magnitude larger.

Abstract

The task of joining two tables is fundamental for querying databases. In this paper, we focus on the equi-join problem, where a pair of records from the two joined tables are part of the join results if equality holds between their values in the join column(s). While this is a tractable problem when the number of records in the joined tables is relatively small, it becomes very challenging as the table sizes increase, especially if hot keys (join column values with a large number of records) exist in both joined tables. This paper, an extended version of [metwally-SIGMOD-2022], proposes Adaptive-Multistage-Join (AM-Join) for scalable and fast equi-joins in distributed shared-nothing architectures. AM-Join utilizes (a) Tree-Join, a proposed novel algorithm that scales well when the joined tables share hot keys, and (b) Broadcast-Join, the known fastest when joining keys that are hot in only one table. Unlike the state-of-the-art algorithms, AM-Join (a) holistically solves the join-skew problem by achieving load balancing throughout the join execution, and (b) supports all outer-join variants without record deduplication or custom table partitioning. For the fastest AM-Join outer-join performance, we propose the Index-Broadcast-Join (IB-Join) family of algorithms for Small-Large joins, where one table fits in memory and the other can be up to orders of magnitude larger. The outer-join variants of IB-Join improves on the state-of-the-art Small-Large outer-join algorithms. The proposed algorithms can be adopted in any shared-nothing architecture. We implemented a MapReduce version using Spark. Our evaluation shows the proposed algorithms execute significantly faster and scale to more skewed and orders-of-magnitude bigger tables when compared to the state-of-the-art algorithms.
Paper Structure (46 sections, 13 equations, 18 figures, 4 tables, 22 algorithms)

This paper contains 46 sections, 13 equations, 18 figures, 4 tables, 22 algorithms.

Figures (18)

  • Figure 1: An example for joining two relations. (a) shows the input relations. (b) through (e) show the results of the inner, left-outer, right-outer, and full-outer-joins, respectively. The tables are sorted for readability, but sorting is not guaranteed in practice.
  • Figure 2: A schematic example for joining a specific hot key, $k_1$, using the multi-executor-per-key Shuffle-Join algorithms afrati2010optimizingokcan2011processingbeame2014skewli2018submodularity. (a) The $\mathcal{R}$ and $\mathcal{S}$ relations to be joined. (b) The records with key $k_1$ replicated among the rows and columns of the grid of executors assigned to $k_1$.
  • Figure 3: A schematic example for joining a specific key, $k_1$, using the basic Tree-Join algorithm. (a) The $\mathcal{R}$ and $\mathcal{S}$ relations to be joined. (b) The $k_1$ joined lists in the initial joined index formed using groupByKey. Each of the joined lists are chunked into two sub-lists (to simplify the figure). (c) Each sub-list from $\mathcal{R}$ is produced with all the sub-lists from $\mathcal{S}$. These pairs of sublists can be processed by different executors in the next iteration. (b) and (c) together show the top two levels of the tree formed by the basic Tree-Join algorithm.
  • Figure 4: An example of inner-joining two relations using treeJoinBasic. (a) The input relations. (b) The initial index built from the input relations in (a). The initial index is used to produce (c) the first partial results, and (d) the first joined index. The first joined index is used to produce (e) the second partial results, and (f) the second joined index. The second joined index is used to produce (g) the third partial results. The union of the partial results in (c), (e), and (g) constitutes the inner-join results.
  • Figure 5: A schematic example for joining a specific hot key, $k_1$, using the load-balanced Tree-Join algorithm. (a) $\mathcal{R}_{H}$ and $\mathcal{S}_{H}$, the respective hot splits of the $\mathcal{R}$ and $\mathcal{S}$ relations to be joined. (b) The $k_1$ records replicated among multiple augmented $k_1$ keys in the unraveled forms of $\mathcal{R}_{H}$ and $\mathcal{S}_{H}$, $unraveled_{\mathcal{R}_{H}}$ and $unraveled_{\mathcal{S}_{H}}$, respectively. (c) The records of the unravelled relations grouped by the augmented keys using groupByKey to form the first joined index, $joined\_index_{AK}$.
  • ...and 13 more figures