Table of Contents
Fetching ...

CPMA: An Efficient Batch-Parallel Compressed Set Without Pointers

Brian Wheatman, Randal Burns, Aydın Buluç, Helen Xu

TL;DR

This work introduces CPMA, a batch-parallel, compressed variant of the Packed Memory Array (PMA) that preserves the PMA’s cache-friendly layout while enabling efficient parallel batch updates and range queries. By applying delta encoding and byte-code compression at the leaf level, CPMA reduces memory traffic without altering the implicit tree structure, yielding superior throughput for batch inserts and range queries compared with state-of-the-art compressed trees and related structures. The authors provide a work-efficient parallel batch-update algorithm (batch-merge, counting, redistribution) and prove that CPMA maintains the same asymptotic bounds as PMA for core operations, while experiments show CPMA's practical gains: approximately 3x faster batch inserts and 4x faster range queries versus compressed PaC-trees, with strong parallel scaling up to 64 cores. The paper also demonstrates real-world applicability by building F-Graph, a dynamic-graph system on CPMA that outperforms PaC-based and Aspen frameworks in graph workloads and updates, highlighting the practical impact of memory-subsystem-aware data structures in large-scale dynamic processing.

Abstract

This paper introduces the batch-parallel Compressed Packed Memory Array (CPMA), a compressed, dynamic, ordered set data structure based on the Packed Memory Array (PMA). Traditionally, batch-parallel sets are built on pointer-based data structures such as trees because pointer-based structures enable fast parallel unions via pointer manipulation. When compared with cache-optimized trees, PMAs were slower to update but faster to scan. The batch-parallel CPMA overcomes this tradeoff between updates and scans by optimizing for cache-friendliness. On average, the CPMA achieves 3x faster batch-insert throughput and 4x faster range-query throughput compared with compressed PaC-trees, a state-of-the-art batch-parallel set library based on cache-optimized trees. We further evaluate the CPMA compared with compressed PaC-trees and Aspen, a state-of-the-art system, on a real-world application of dynamic-graph processing. The CPMA is on average 1.2x faster on a suite of graph algorithms and 2x faster on batch inserts when compared with compressed PaC-trees. Furthermore, the CPMA is on average 1.3x faster on graph algorithms and 2x faster on batch inserts compared with Aspen.

CPMA: An Efficient Batch-Parallel Compressed Set Without Pointers

TL;DR

This work introduces CPMA, a batch-parallel, compressed variant of the Packed Memory Array (PMA) that preserves the PMA’s cache-friendly layout while enabling efficient parallel batch updates and range queries. By applying delta encoding and byte-code compression at the leaf level, CPMA reduces memory traffic without altering the implicit tree structure, yielding superior throughput for batch inserts and range queries compared with state-of-the-art compressed trees and related structures. The authors provide a work-efficient parallel batch-update algorithm (batch-merge, counting, redistribution) and prove that CPMA maintains the same asymptotic bounds as PMA for core operations, while experiments show CPMA's practical gains: approximately 3x faster batch inserts and 4x faster range queries versus compressed PaC-trees, with strong parallel scaling up to 64 cores. The paper also demonstrates real-world applicability by building F-Graph, a dynamic-graph system on CPMA that outperforms PaC-based and Aspen frameworks in graph workloads and updates, highlighting the practical impact of memory-subsystem-aware data structures in large-scale dynamic processing.

Abstract

This paper introduces the batch-parallel Compressed Packed Memory Array (CPMA), a compressed, dynamic, ordered set data structure based on the Packed Memory Array (PMA). Traditionally, batch-parallel sets are built on pointer-based data structures such as trees because pointer-based structures enable fast parallel unions via pointer manipulation. When compared with cache-optimized trees, PMAs were slower to update but faster to scan. The batch-parallel CPMA overcomes this tradeoff between updates and scans by optimizing for cache-friendliness. On average, the CPMA achieves 3x faster batch-insert throughput and 4x faster range-query throughput compared with compressed PaC-trees, a state-of-the-art batch-parallel set library based on cache-optimized trees. We further evaluate the CPMA compared with compressed PaC-trees and Aspen, a state-of-the-art system, on a real-world application of dynamic-graph processing. The CPMA is on average 1.2x faster on a suite of graph algorithms and 2x faster on batch inserts when compared with compressed PaC-trees. Furthermore, the CPMA is on average 1.3x faster on graph algorithms and 2x faster on batch inserts compared with Aspen.
Paper Structure (42 sections, 5 theorems, 13 figures, 15 tables)

This paper contains 42 sections, 5 theorems, 13 figures, 15 tables.

Key Result

Lemma 1

Given a batch of $\;k$ sorted elements, the work of the batch-merge phase is $O(k\log(n))$, and the span is $O(\log(k)\log(n))$.

Figures (13)

  • Figure 1: Insert throughput as a function of batch size.
  • Figure 2: Range query throughput as a function of range size.
  • Figure 3: Example of an insertion in a PMA with leaf density bound of 0.9 and leaf size of 4.
  • Figure 4: Example of batch insertion in a PMA with leaf density bound of 0.9 and leaf size of 4. After the merge, there are more elements in the second leaf than the leaf size, so the number of elements is stored in the leaf, and the elements are stored out-of-place until the redistribute.
  • Figure 5: An example of the work-efficient counting algorithm for batch updates. The blocks at the top represent the PMA leaves and the dots represent elements in the PMA. The pink blocks with arrows represent leaves that were touched during a batch update. The tree below the PMA is the implicit PMA tree of nodes labeled with a tuple of (height, index) (indices are assigned left to right). The blue solid circles represent PMA nodes that must be counted because their sibling or child violated its density. The tan dotted circles represent PMA nodes that did not need to be counted.
  • ...and 8 more figures

Theorems & Definitions (5)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Theorem 5