Work-Efficient Query Evaluation with PRAMs
Jens Keppeler, Thomas Schwentick, Christopher Spinrath
TL;DR
This work investigates work-efficient constant-time parallel query evaluation on CRCW PRAMs, addressing the fundamental challenge that although relational algebra can be evaluated in constant time, output representation and deduplication can incur large work costs. The authors develop a framework combining approximate prefix sums, approximate compaction, and padded sorting to enable weakly work-efficient constant-time algorithms across several settings. They prove strong results for dictionary-based representations: acyclic join queries and free-connex acyclic joins achieve near worst-case-optimal work bounds, while semijoin algebra queries admit work-optimal evaluation, and joins can reach worst-case-lean bounds in a constant-time regime. They further extend the results to ordered and general settings via reductions to the dictionary setting, and outline several open questions, including dynamic maintenance and potential PANDA-driven improvements, highlighting the practical significance of work-efficient constant-time parallel query processing.
Abstract
The article studies query evaluation in parallel constant time in the CRCW PRAM model. While it is well-known that all relational algebra queries can be evaluated in constant time on an appropriate CRCW PRAM model, this article is interested in the efficiency of evaluation algorithms, that is, in the number of processors or, asymptotically equivalent, in the work. Naive evaluation in the parallel setting results in huge (polynomial) bounds on the work of such algorithms and in presentations of the result sets that can be extremely scattered in memory. The article discusses some obstacles for constant-time PRAM query evaluation. It presents algorithms for relational operators and explores three settings, in which efficient sequential query evaluation algorithms exist: acyclic queries, semijoin algebra queries, and join queries -- the latter in the worst-case optimal framework. Under mild assumptions -- that data values are numbers of polynomial size in the size of the database or that the relations of the database are suitably sorted -- constant-time algorithms are presented that are weakly work-efficient in the sense that work $\mathcal{O}(T^{1+\varepsilon})$ can be achieved, for every $\varepsilon>0$, compared to the time $T$ of an optimal sequential algorithm. Important tools are the algorithms for approximate prefix sums and compaction from Goldberg and Zwick (1995).
