Table of Contents
Fetching ...

Exploring the Design Space for Message-Driven Systems for Dynamic Graph Processing using CCA

Bibrak Qamar Chandio, Maciej Brodowicz, Thomas Sterling

TL;DR

This work argues that irregular, dynamic graph workloads outstrip conventional von Neumann architectures and proposes the Continuum Computer Architecture (CCA), a memory-centric, message-driven model with a global address space to unlock fine-grain parallelism. It outlines a hardware design space built from tessellated Compute Cells (CCs) forming a memory–compute–communication continuum and introduces batched dynamic graph processing via actions and Recursively Parallel Vertex Objects (RPVOs). The paper combines theoretical/systems synthesis with hardware-space exploration, detailing shape, memory, and communication trade-offs, and demonstrates batched dynamic BFS on a CCASimulator, highlighting how dynamic data movement and asynchronous execution can improve performance on dynamic graphs. It also maps concrete future directions—reducing NoC diameter, adaptive routing, and wafer-scale deployments—that could realize scalable non-von Neumann accelerators for graph-centric AI workloads.

Abstract

Computer systems that have been successfully deployed for dense regular workloads fall short of achieving scalability and efficiency when applied to irregular and dynamic graph applications. Conventional computing systems rely heavily on static, regular, numeric intensive computations while High Performance Computing systems executing parallel graph applications exhibit little locality, spatial or temporal, and are fine-grained and memory intensive. With the strong interest in AI which depend on these very different use cases combined with the end of Moore's Law at nanoscale, dramatic alternatives in architecture and underlying execution models are required. This paper identifies an innovative non-von Neumann architecture, Continuum Computer Architecture (CCA), that redefines the nature of computing structures to yield powerful innovations in computational methods to deliver a new generation of highly parallel hardware architecture. CCA reflects a genus of highly parallel architectures that while varying in specific quantities (e.g., memory blocks), share a multiple of attributes not found in typical von Neumann machines. Among these are memory-centric components, message-driven asynchronous flow control, and lightweight out-of-order execution across a global name space. Together these innovative non-von Neumann architectural properties guided by a new original execution model will deliver the new future path for extending beyond the von Neumann model. This paper documents a series of interrelated experiments that together establish future directions for next generation non-von Neumann architectures, especially for graph processing.

Exploring the Design Space for Message-Driven Systems for Dynamic Graph Processing using CCA

TL;DR

This work argues that irregular, dynamic graph workloads outstrip conventional von Neumann architectures and proposes the Continuum Computer Architecture (CCA), a memory-centric, message-driven model with a global address space to unlock fine-grain parallelism. It outlines a hardware design space built from tessellated Compute Cells (CCs) forming a memory–compute–communication continuum and introduces batched dynamic graph processing via actions and Recursively Parallel Vertex Objects (RPVOs). The paper combines theoretical/systems synthesis with hardware-space exploration, detailing shape, memory, and communication trade-offs, and demonstrates batched dynamic BFS on a CCASimulator, highlighting how dynamic data movement and asynchronous execution can improve performance on dynamic graphs. It also maps concrete future directions—reducing NoC diameter, adaptive routing, and wafer-scale deployments—that could realize scalable non-von Neumann accelerators for graph-centric AI workloads.

Abstract

Computer systems that have been successfully deployed for dense regular workloads fall short of achieving scalability and efficiency when applied to irregular and dynamic graph applications. Conventional computing systems rely heavily on static, regular, numeric intensive computations while High Performance Computing systems executing parallel graph applications exhibit little locality, spatial or temporal, and are fine-grained and memory intensive. With the strong interest in AI which depend on these very different use cases combined with the end of Moore's Law at nanoscale, dramatic alternatives in architecture and underlying execution models are required. This paper identifies an innovative non-von Neumann architecture, Continuum Computer Architecture (CCA), that redefines the nature of computing structures to yield powerful innovations in computational methods to deliver a new generation of highly parallel hardware architecture. CCA reflects a genus of highly parallel architectures that while varying in specific quantities (e.g., memory blocks), share a multiple of attributes not found in typical von Neumann machines. Among these are memory-centric components, message-driven asynchronous flow control, and lightweight out-of-order execution across a global name space. Together these innovative non-von Neumann architectural properties guided by a new original execution model will deliver the new future path for extending beyond the von Neumann model. This paper documents a series of interrelated experiments that together establish future directions for next generation non-von Neumann architectures, especially for graph processing.
Paper Structure (12 sections, 7 figures, 2 tables)

This paper contains 12 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Execution signatures showing serial and parallel portions. Red is serial and blue is parallel.
  • Figure 2: Idea of the compute continuum. Storage and computation happens in a medium of computing cells. The medium can be a continuum of a large number of tiny computing cells to a single big cell.
  • Figure 3: Shapes of Compute Cells (CC) and their mesh tessellations.
  • Figure 4: A $5\times6$ chip shown as an exemplar. Compute Cells containing local memory along with computing logic are tessellated in a mesh network.
  • Figure 5: Relationship between number of mesh connected square shaped Compute Cells (CCs), mesh NoC diameter, and the total chip memory capacity in under a $306 mm^2$ area. Some data points are annotated with details so as to capture more context. X-axis in log scale.
  • ...and 2 more figures