Table of Contents
Fetching ...

Efficient Distributed Data Structures for Future Many-core Architectures

Panagiota Fatourou, Nikolaos D. Kallimanis, Eleni Kanellou, Odysseas Makridakis, Christi Symeonidou

TL;DR

The paper tackles scalable data-structure design for future many-core architectures lacking global cache coherence. It introduces both directory-based and token-based approaches, with hierarchical extensions, to implement stacks, queues, deques, and lists across islands using mechanisms like elimination, combining, DMA batching, and per-end token control. It provides formal models, linearizability proofs, and extensive experimental evaluation on a $512$-core Formic prototype, demonstrating when hierarchical strategies yield throughput and energy advantages. The work offers practical blueprints for next-generation concurrency utilities and runtime support, highlighting how structure and locality influence scalability in non cache-coherent environments.

Abstract

We study general techniques for implementing distributed data structures on top of future many-core architectures with non cache-coherent or partially cache-coherent memory. With the goal of contributing towards what might become, in the future, the concurrency utilities package in Java collections for such architectures, we end up with a comprehensive collection of data structures by considering different variants of these techniques. To achieve scalability, we study a generic scheme which makes all our implementations hierarchical. We consider a collection of known techniques for improving the scalability of concurrent data structures and we adjust them to work in our setting. We have performed experiments which illustrate that some of these techniques have indeed high impact on achieving scalability. Our experiments also reveal the performance and scalability power of the hierarchical approach. We finally present experiments to study energy consumption aspects of the proposed techniques by using an energy model recently proposed for such architectures.

Efficient Distributed Data Structures for Future Many-core Architectures

TL;DR

The paper tackles scalable data-structure design for future many-core architectures lacking global cache coherence. It introduces both directory-based and token-based approaches, with hierarchical extensions, to implement stacks, queues, deques, and lists across islands using mechanisms like elimination, combining, DMA batching, and per-end token control. It provides formal models, linearizability proofs, and extensive experimental evaluation on a -core Formic prototype, demonstrating when hierarchical strategies yield throughput and energy advantages. The work offers practical blueprints for next-generation concurrency utilities and runtime support, highlighting how structure and locality influence scalability in non cache-coherent environments.

Abstract

We study general techniques for implementing distributed data structures on top of future many-core architectures with non cache-coherent or partially cache-coherent memory. With the goal of contributing towards what might become, in the future, the concurrency utilities package in Java collections for such architectures, we end up with a comprehensive collection of data structures by considering different variants of these techniques. To achieve scalability, we study a generic scheme which makes all our implementations hierarchical. We consider a collection of known techniques for improving the scalability of concurrent data structures and we adjust them to work in our setting. We have performed experiments which illustrate that some of these techniques have indeed high impact on achieving scalability. Our experiments also reveal the performance and scalability power of the hierarchical approach. We finally present experiments to study energy consumption aspects of the proposed techniques by using an energy model recently proposed for such architectures.
Paper Structure (38 sections, 30 theorems, 2 equations, 14 figures, 35 algorithms)

This paper contains 38 sections, 30 theorems, 2 equations, 14 figures, 35 algorithms.

Key Result

Lemma 1

The linearization point of a push (pop) operation $op$ is placed within its execution interval.

Figures (14)

  • Figure 1: Performance evaluation of (a) distributed queue implementations, (b) distributed queue implementations while executing different amounts of local work ($512$ cores).
  • Figure 2: Performance evaluation of (a) distributed stack implementations with elimination, (b) distributed stack implementations without elimination.
  • Figure 3: Total number of messages received in the proposed implementations by all servers in the system for $10^7$ operations.
  • Figure 4: Scalability factors of presented algorithms.
  • Figure : Push operation for a client of the directory-based stack.
  • ...and 9 more figures

Theorems & Definitions (48)

  • Lemma 1
  • Lemma 5
  • proof
  • Theorem 6
  • Lemma 7
  • proof
  • Lemma 10
  • proof
  • Lemma 12
  • proof
  • ...and 38 more