CPMA: An Efficient Batch-Parallel Compressed Set Without Pointers
Brian Wheatman, Randal Burns, Aydın Buluç, Helen Xu
TL;DR
This work introduces CPMA, a batch-parallel, compressed variant of the Packed Memory Array (PMA) that preserves the PMA’s cache-friendly layout while enabling efficient parallel batch updates and range queries. By applying delta encoding and byte-code compression at the leaf level, CPMA reduces memory traffic without altering the implicit tree structure, yielding superior throughput for batch inserts and range queries compared with state-of-the-art compressed trees and related structures. The authors provide a work-efficient parallel batch-update algorithm (batch-merge, counting, redistribution) and prove that CPMA maintains the same asymptotic bounds as PMA for core operations, while experiments show CPMA's practical gains: approximately 3x faster batch inserts and 4x faster range queries versus compressed PaC-trees, with strong parallel scaling up to 64 cores. The paper also demonstrates real-world applicability by building F-Graph, a dynamic-graph system on CPMA that outperforms PaC-based and Aspen frameworks in graph workloads and updates, highlighting the practical impact of memory-subsystem-aware data structures in large-scale dynamic processing.
Abstract
This paper introduces the batch-parallel Compressed Packed Memory Array (CPMA), a compressed, dynamic, ordered set data structure based on the Packed Memory Array (PMA). Traditionally, batch-parallel sets are built on pointer-based data structures such as trees because pointer-based structures enable fast parallel unions via pointer manipulation. When compared with cache-optimized trees, PMAs were slower to update but faster to scan. The batch-parallel CPMA overcomes this tradeoff between updates and scans by optimizing for cache-friendliness. On average, the CPMA achieves 3x faster batch-insert throughput and 4x faster range-query throughput compared with compressed PaC-trees, a state-of-the-art batch-parallel set library based on cache-optimized trees. We further evaluate the CPMA compared with compressed PaC-trees and Aspen, a state-of-the-art system, on a real-world application of dynamic-graph processing. The CPMA is on average 1.2x faster on a suite of graph algorithms and 2x faster on batch inserts when compared with compressed PaC-trees. Furthermore, the CPMA is on average 1.3x faster on graph algorithms and 2x faster on batch inserts compared with Aspen.
