TEGRA -- Scaling Up Terascale Graph Processing with Disaggregated Computing
William Shaddix, Mahyar Samani, Marjan Fariborz, S. J. Ben Yoo, Jason Lowe-Power, Venkatesh Akella
TL;DR
The paper tackles terascale graph processing, where traditional CPUs/GPUs struggle to meet real-time requirements. It proposes TEGRA, a scale-up graph accelerator with disaggregated memory and a communication fabric inspired by Active Messages to reduce communication overhead and memory stranding. Key contributions include the design of a composable, scale-up architecture, exploration of memory technologies and interconnects, and comparative performance evaluations on graph workloads like BFS, BC, and SSSP. The work demonstrates that decoupling compute and memory with specialized messaging and heterogeneous memory can enable efficient processing of graphs with trillions of edges, advancing practical terascale graph analytics.
Abstract
Graphs are essential for representing relationships in various domains, driving modern AI applications such as graph analytics and neural networks across science, engineering, cybersecurity, transportation, and economics. However, the size of modern graphs are rapidly expanding, posing challenges for traditional CPUs and GPUs in meeting real-time processing demands. As a result, hardware accelerators for graph processing have been proposed. However, the largest graphs that can be handled by these systems is still modest often targeting Twitter graph(1.4B edges approximately). This paper aims to address this limitation by developing a graph accelerator capable of terascale graph processing. Scale out architectures, architectures where nodes are replicated to expand to larger datasets, are natural for handling larger graphs. We argue that this approach is not appropriate for very large-scale graphs because it leads to under utilization of both memory resources and compute resources. Additionally, vertex and edge processing have different access patterns. Communication overheads also pose further challenges in designing scalable architectures. To overcome these issues, this paper proposes TEGRA, a scale-up architecture for terascale graph processing. TEGRA leverages a composable computing system with disaggregated resources and a communication architecture inspired by Active Messages. By employing direct communication between cores and optimizing memory interconnect utilization, TEGRA effectively reduces communication overhead and improves resource utilization, therefore enabling efficient processing of terascale graphs.
