Table of Contents
Fetching ...

Red-Blue Pebbling with Multiple Processors: Time, Communication and Memory Trade-offs

Toni Böhnlein, Pál András Papp, A. N. Yzelman

TL;DR

This is the first thorough study that combines pebbling and DAG scheduling problems, capturing the computation of general workloads on multiple processors with memory constraints and communication costs and an extension of NP-hardness results for specific DAG classes from simpler models.

Abstract

The well-studied red-blue pebble game models the execution of an arbitrary computational DAG by a single processor over a two-level memory hierarchy. We present a natural generalization to a multiprocessor setting where each processor has its own limited fast memory, and all processors share unlimited slow memory. To our knowledge, this is the first thorough study that combines pebbling and DAG scheduling problems, capturing the computation of general workloads on multiple processors with memory constraints and communication costs. Our pebbling model enables us to analyze trade-offs between workload balancing, communication and memory limitations, and it captures real-world factors such as superlinear speedups due to parallelization. Our results include upper and lower bounds on the pebbling cost, an analysis of a greedy pebbling strategy, and an extension of NP-hardness results for specific DAG classes from simpler models. For our main technical contribution, we show two inapproximability results that already hold for the long-standing problem of standard red-blue pebbling: (i) the optimal I/O cost cannot be approximated to any finite factor, and (ii) the optimal total cost (I/O+computation) can only be approximated to a limited constant factor, i.e., it does not allow for a polynomial-time approximation scheme. These results also carry over naturally to our multiprocessor pebbling model.

Red-Blue Pebbling with Multiple Processors: Time, Communication and Memory Trade-offs

TL;DR

This is the first thorough study that combines pebbling and DAG scheduling problems, capturing the computation of general workloads on multiple processors with memory constraints and communication costs and an extension of NP-hardness results for specific DAG classes from simpler models.

Abstract

The well-studied red-blue pebble game models the execution of an arbitrary computational DAG by a single processor over a two-level memory hierarchy. We present a natural generalization to a multiprocessor setting where each processor has its own limited fast memory, and all processors share unlimited slow memory. To our knowledge, this is the first thorough study that combines pebbling and DAG scheduling problems, capturing the computation of general workloads on multiple processors with memory constraints and communication costs. Our pebbling model enables us to analyze trade-offs between workload balancing, communication and memory limitations, and it captures real-world factors such as superlinear speedups due to parallelization. Our results include upper and lower bounds on the pebbling cost, an analysis of a greedy pebbling strategy, and an extension of NP-hardness results for specific DAG classes from simpler models. For our main technical contribution, we show two inapproximability results that already hold for the long-standing problem of standard red-blue pebbling: (i) the optimal I/O cost cannot be approximated to any finite factor, and (ii) the optimal total cost (I/O+computation) can only be approximated to a limited constant factor, i.e., it does not allow for a polynomial-time approximation scheme. These results also carry over naturally to our multiprocessor pebbling model.
Paper Structure (40 sections, 15 theorems, 10 equations, 4 figures)

This paper contains 40 sections, 15 theorems, 10 equations, 4 figures.

Key Result

Lemma 1

For any instance of MPP, we have $\frac{n}{k} \leq \texttt{OPT} \leq (g \cdot (\Delta_{in} + 1) + 1) \cdot n$.

Figures (4)

  • Figure 1: A simple example DAG for pebbling.
  • Figure 2: Zipper gadget consisting of $2$ input groups $S_{1\!} =_{\!}\{u_1, u_2, \ldots, u_d \}$ and $S_{2\!} =_{\!} \{u_{d+1}, \ldots, u_{2d} \}$, and a main chain $v_1, \ldots, v_{n_0}$. The edges going from the input groups are combined into a single arrow for simplicity. The extension to discourage recomputation is only illustrated for $u_1$ in gray.
  • Figure 3: Examples of consecutive levels. The 1st level (bottom row) always has size $\ell=5$. The 2nd level (top row) has $\ell'=5$ on the left side, $\ell'=7$ in the middle, and $\ell'=3$ on the right.
  • Figure 4: High-level sketch of our construction: the main tower on the left, and node/edge gadgets on the right, with the level sizes and the dependency between incident node-edge pairs also shown.

Theorems & Definitions (25)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Corollary 6
  • Lemma 7
  • Lemma 8
  • Lemma 9
  • Lemma 10
  • ...and 15 more