Table of Contents
Fetching ...

BCL: A Cross-Platform Distributed Container Library

Benjamin Brock, Aydın Buluç, Katherine Yelick

TL;DR

BCL tackles the challenge of irregular parallel applications by offering a cross-platform, coordination-free library of distributed data structures built on one-sided RDMA primitives. It combines a lightweight internal DSL (BCL Core) with a serialization layer (ObjectContainers) to provide generic, high-performance structures such as hash tables, queues, and Bloom filters across MPI, OpenSHMEM, GASNet-EX, and UPC++ backends. The framework supports multiple atomicity levels via concurrency promises, enabling optimized implementations for phasal workloads, and demonstrates strong performance on ISx, genome assembly, and microbenchmarks across several HPC systems. The work shows that BCL can match or exceed hand-tuned domain-specific implementations while maintaining portability and ease of integration, potentially broadening the use of high-level data structures in PGAS-like and MPI-based codes.

Abstract

One-sided communication is a useful paradigm for irregular parallel applications, but most one-sided programming environments, including MPI's one-sided interface and PGAS programming languages, lack application level libraries to support these applications. We present the Berkeley Container Library, a set of generic, cross-platform, high-performance data structures for irregular applications, including queues, hash tables, Bloom filters and more. BCL is written in C++ using an internal DSL called the BCL Core that provides one-sided communication primitives such as remote get and remote put operations. The BCL Core has backends for MPI, OpenSHMEM, GASNet-EX, and UPC++, allowing BCL data structures to be used natively in programs written using any of these programming environments. Along with our internal DSL, we present the BCL ObjectContainer abstraction, which allows BCL data structures to transparently serialize complex data types while maintaining efficiency for primitive types. We also introduce the set of BCL data structures and evaluate their performance across a number of high-performance computing systems, demonstrating that BCL programs are competitive with hand-optimized code, even while hiding many of the underlying details of message aggregation, serialization, and synchronization.

BCL: A Cross-Platform Distributed Container Library

TL;DR

BCL tackles the challenge of irregular parallel applications by offering a cross-platform, coordination-free library of distributed data structures built on one-sided RDMA primitives. It combines a lightweight internal DSL (BCL Core) with a serialization layer (ObjectContainers) to provide generic, high-performance structures such as hash tables, queues, and Bloom filters across MPI, OpenSHMEM, GASNet-EX, and UPC++ backends. The framework supports multiple atomicity levels via concurrency promises, enabling optimized implementations for phasal workloads, and demonstrates strong performance on ISx, genome assembly, and microbenchmarks across several HPC systems. The work shows that BCL can match or exceed hand-tuned domain-specific implementations while maintaining portability and ease of integration, potentially broadening the use of high-level data structures in PGAS-like and MPI-based codes.

Abstract

One-sided communication is a useful paradigm for irregular parallel applications, but most one-sided programming environments, including MPI's one-sided interface and PGAS programming languages, lack application level libraries to support these applications. We present the Berkeley Container Library, a set of generic, cross-platform, high-performance data structures for irregular applications, including queues, hash tables, Bloom filters and more. BCL is written in C++ using an internal DSL called the BCL Core that provides one-sided communication primitives such as remote get and remote put operations. The BCL Core has backends for MPI, OpenSHMEM, GASNet-EX, and UPC++, allowing BCL data structures to be used natively in programs written using any of these programming environments. Along with our internal DSL, we present the BCL ObjectContainer abstraction, which allows BCL data structures to transparently serialize complex data types while maintaining efficiency for primitive types. We also introduce the set of BCL data structures and evaluate their performance across a number of high-performance computing systems, demonstrating that BCL programs are competitive with hand-optimized code, even while hiding many of the underlying details of message aggregation, serialization, and synchronization.

Paper Structure

This paper contains 47 sections, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Organizational diagram of BCL.
  • Figure 2: Process for pushing values to a BCL FastQueue. First (1) a fetch_and_add operation is performed, which returns a reserved location where values can be inserted. Then (2) the values to be inserted are copied to the queue.
  • Figure 3: Our bucket sort implementation in BCL for the ISx benchmark.
  • Figure 4: A small change to user code---inserting into the HashMapBuffer instead of the HashMap---causes inserts to be batched together.
  • Figure 5: Performance comparison on the ISx benchmark on three different computing systems. All runs measure weak scaling with $2^{24}$ items per process.
  • ...and 6 more figures