Table of Contents
Fetching ...

Parallel Query Processing with Heterogeneous Machines

Simon Frisk, Paraschos Koutris

TL;DR

This work addresses computing full Conjunctive Queries in a one-round parallel setting with heterogeneous machines, each having a distinct cost function g_c. By partitioning the output space into machine-specific hyperrectangles and packing these regions, the authors extend the HyperCube approach to heterogeneous costs, deriving upper bounds L^{upper}_{\mathbf{v}} and matching lower bounds via fractional edge packings. They show optimal bounds for equal-cardinality inputs under linear and broader cost models, and establish bounds for several queries under unequal cardinalities, including the Cartesian product, binary join, star, and triangle. The results have implications for practical, disaggregated storage systems and topology-aware data processing, offering principled load-balancing strategies that respect machine heterogeneity. The paper also introduces generalized cost-function analysis and outlines future work for broader CQ classes.

Abstract

We study the problem of computing a full Conjunctive Query in parallel using $p$ heterogeneous machines. Our computational model is similar to the MPC model, but each machine has its own cost function mapping from the number of bits it receives to a cost. An optimal algorithm should minimize the maximum cost across all machines. We consider algorithms over a single communication round and give a lower bound and matching upper bound for databases where each relation has the same cardinality. We do this for both linear cost functions like in previous work, but also for more general cost functions. For databases with relations of different cardinalities, we also find a lower bound, and give matching upper bounds for specific queries like the cartesian product, the join, the star query, and the triangle query. Our approach is inspired by the HyperCube algorithm, but there are additional challenges involved when machines have heterogeneous cost functions.

Parallel Query Processing with Heterogeneous Machines

TL;DR

This work addresses computing full Conjunctive Queries in a one-round parallel setting with heterogeneous machines, each having a distinct cost function g_c. By partitioning the output space into machine-specific hyperrectangles and packing these regions, the authors extend the HyperCube approach to heterogeneous costs, deriving upper bounds L^{upper}_{\mathbf{v}} and matching lower bounds via fractional edge packings. They show optimal bounds for equal-cardinality inputs under linear and broader cost models, and establish bounds for several queries under unequal cardinalities, including the Cartesian product, binary join, star, and triangle. The results have implications for practical, disaggregated storage systems and topology-aware data processing, offering principled load-balancing strategies that respect machine heterogeneity. The paper also introduces generalized cost-function analysis and outlines future work for broader CQ classes.

Abstract

We study the problem of computing a full Conjunctive Query in parallel using heterogeneous machines. Our computational model is similar to the MPC model, but each machine has its own cost function mapping from the number of bits it receives to a cost. An optimal algorithm should minimize the maximum cost across all machines. We consider algorithms over a single communication round and give a lower bound and matching upper bound for databases where each relation has the same cardinality. We do this for both linear cost functions like in previous work, but also for more general cost functions. For databases with relations of different cardinalities, we also find a lower bound, and give matching upper bounds for specific queries like the cartesian product, the join, the star query, and the triangle query. Our approach is inspired by the HyperCube algorithm, but there are additional challenges involved when machines have heterogeneous cost functions.
Paper Structure (30 sections, 51 theorems, 38 equations, 2 figures, 1 algorithm)

This paper contains 30 sections, 51 theorems, 38 equations, 2 figures, 1 algorithm.

Key Result

Theorem 1

Let $q$ be a full CQ with uniform arity $r$ and a $\theta$-dense input $I$ with domain $[n]$ (every relation has size $m=\theta n^r$). Then, for every fractional vertex cover $\mathbf{v}$, we can evaluate $q$ in one round in the linear cost model with load $O(L^{\textsf{upper}}_\mathbf{v})$.

Figures (2)

  • Figure 1: One way to pack the machines in the example.
  • Figure 2: $\gamma$ as a function of $\theta$.

Theorems & Definitions (61)

  • Theorem 1: Dense Inputs
  • Theorem 2: Sparse Inputs
  • Theorem 3
  • Example 4
  • Theorem 5: Packing Theorem
  • Lemma 6
  • Lemma 7
  • Lemma 8
  • Lemma 9
  • Lemma 10
  • ...and 51 more