Table of Contents
Fetching ...

Root-to-Leaf Scheduling in Write-Optimized Trees

Christopher Chung, William Jannen, Samuel McCauley, Bertrand Simon

TL;DR

WORMS addresses the challenge of efficiently flushing root-to-leaf messages in write-optimized trees under the DAM model, where some operations only complete after traversing the entire root-to-leaf path. The authors reduce WORMS to the classic scheduling problem $P|outtree, p_j=1|\sum wC$ and develop an $O(1)$-approximation for WORMS by composing a reduction, a simple 4-approximation for the scheduling problem, and a conversion back to valid WORMS schedules; they also prove WORMS is NP-hard. The key ideas include packed sets and three partial schedules that yield a modular reduction, along with a practical $4$-approximation (MPHTF) for parallel scheduling on tree precedences. The results provide a principled framework for handling backlog root-to-leaf operations like secure deletes and deferred queries, improving throughput while controlling average latency in write-optimized data structures such as $B^{\varepsilon}$-trees.

Abstract

Write-optimized dictionaries are a class of cache-efficient data structures that buffer updates and apply them in batches to optimize the amortized cache misses per update. For example, a B^epsilon tree inserts updates as messages at the root. B^epsilon trees only move ("flush") messages when they have total size close to a cache line, optimizing the amount of work done per cache line written. Thus, recently-inserted messages reside at or near the root and are only flushed down the tree after a sufficient number of new messages arrive. Although this lazy approach works well for many operations, some types of updates do not complete until the update message reaches a leaf. For example, deferred queries and secure deletes must flush through all nodes along their root-to-leaf path before taking effect. What happens when we want to service a large number of (say) secure deletes as quickly as possible? Classic techniques leave us with an unsavory choice. On the one hand, we can group the delete messages using a write-optimized approach and move them down the tree lazily. But then many individual deletes may be left incomplete for an extended period of time, as their messages wait to be grouped with a sufficiently large number of related messages. On the other hand, we can ignore cache efficiency and perform a root-to-leaf flush for each delete. This begins work on individual deletes immediately, but harms system throughput. This paper investigates a new framework for efficiently flushing collections of messages from the root to their leaves in a write-optimized data structure. Our goal is to minimize the average time that messages reach the leaves. We give an algorithm that O(1)-approximates the optimal average completion time in this model. Along the way, we give a new 4-approximation algorithm for scheduling parallel tasks for weighted completion time with tree precedence constraints.

Root-to-Leaf Scheduling in Write-Optimized Trees

TL;DR

WORMS addresses the challenge of efficiently flushing root-to-leaf messages in write-optimized trees under the DAM model, where some operations only complete after traversing the entire root-to-leaf path. The authors reduce WORMS to the classic scheduling problem and develop an -approximation for WORMS by composing a reduction, a simple 4-approximation for the scheduling problem, and a conversion back to valid WORMS schedules; they also prove WORMS is NP-hard. The key ideas include packed sets and three partial schedules that yield a modular reduction, along with a practical -approximation (MPHTF) for parallel scheduling on tree precedences. The results provide a principled framework for handling backlog root-to-leaf operations like secure deletes and deferred queries, improving throughput while controlling average latency in write-optimized data structures such as -trees.

Abstract

Write-optimized dictionaries are a class of cache-efficient data structures that buffer updates and apply them in batches to optimize the amortized cache misses per update. For example, a B^epsilon tree inserts updates as messages at the root. B^epsilon trees only move ("flush") messages when they have total size close to a cache line, optimizing the amount of work done per cache line written. Thus, recently-inserted messages reside at or near the root and are only flushed down the tree after a sufficient number of new messages arrive. Although this lazy approach works well for many operations, some types of updates do not complete until the update message reaches a leaf. For example, deferred queries and secure deletes must flush through all nodes along their root-to-leaf path before taking effect. What happens when we want to service a large number of (say) secure deletes as quickly as possible? Classic techniques leave us with an unsavory choice. On the one hand, we can group the delete messages using a write-optimized approach and move them down the tree lazily. But then many individual deletes may be left incomplete for an extended period of time, as their messages wait to be grouped with a sufficiently large number of related messages. On the other hand, we can ignore cache efficiency and perform a root-to-leaf flush for each delete. This begins work on individual deletes immediately, but harms system throughput. This paper investigates a new framework for efficiently flushing collections of messages from the root to their leaves in a write-optimized data structure. Our goal is to minimize the average time that messages reach the leaves. We give an algorithm that O(1)-approximates the optimal average completion time in this model. Along the way, we give a new 4-approximation algorithm for scheduling parallel tasks for weighted completion time with tree precedence constraints.
Paper Structure (30 sections, 13 theorems, 16 equations, 4 figures)

This paper contains 30 sections, 13 theorems, 16 equations, 4 figures.

Key Result

Lemma 1

There is a constant $c_1$ such that for any overfilling schedule $S$ for a WORMS instance $(T,M,P,B)$, we can in $O(n\log n)$ time give a valid schedule $\hat{S}$ for WORMS satisfying $cost(\hat{S}) \leq c_1\cdot cost(S)$.

Figures (4)

  • Figure 1: This figure shows three successive time steps during a cascade of three nodes $v_1$, $v_2$, and $\ell$. The flush that will occur in the next time step is shown with an orange dotted line. Messages that have $\ell$ as a target leaf are represented in red; all others are represented in green or blue. $v_2$ temporarily overflows on the second time step, allowing all messages to be flushed in two time steps.
  • Figure 2: Packed sets of an example WORMS instance. Each leaf $\ell$ is labelled with the number of messages that have $\ell$ as a target leaf. Packed parents are bolded and labelled with the size of their packed contents. Children of an internal packed parent are filled and are colored according to the packed set their messages belong to.
  • Figure 3: This figure shows the precedence constraints of tasks in $\mathcal{T}(T,M,P,B)$ for the WORMS instance in Figure \ref{['fig:packed_sets']}; the coloring of each packed set matches between the figures, except the red packed sets which had a leaf packed parent. All internal tasks have weight $0$; the leaves are labelled with their weight. If all descendant leaves of a task have weight $0$ the task is omitted.
  • Figure 4: A diagram of $T$ in Lemma \ref{['lem:NP_hard']}. Edges in $T_2$ are given as dotted lines; edges in $T_1$ are solid. The number of all messages in $M$ with a given target leaf is given below each leaf.

Theorems & Definitions (27)

  • Lemma 1
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • Lemma 5
  • proof
  • proof
  • ...and 17 more